Acts on the audit's NEXT block — operational resilience.
Backups (N1):
- New `backup` compose service (postgres:16-alpine) runs scripts/backup-loop.sh:
immediate pg_dump on start, then nightly, gzip, 14-day rotation into
./backups on the host. Configurable via BACKUP_RETENTION_DAYS /
BACKUP_INTERVAL_SECONDS. (Offsite copy is the documented next step.)
Resource limits + healthchecks (N2):
- deploy.resources.limits.memory on postgres (2g), app (1500m), nginx (256m),
backup (256m) so no container can starve the others (the Nginx outage was a
reminder).
- Nginx now has a healthcheck hitting a new self-served `/nginx-health`
endpoint on the default_server (no upstream dependency).
Chat resilience (N3):
- buildSystemPrompt() wraps its 4 Prisma queries in try/catch with safe
defaults — if Postgres is down the assistant degrades instead of 500-ing.
- Result is cached for 60s (only on healthy builds) so we don't run 4 queries
per message; CMS edits still appear within the TTL.
- POST fails fast with 503 if OPENAI_API_KEY is missing (instead of breaking
mid-stream after headers are sent).
- streamText gets an onError handler that logs + persists an `error` AiEvent.
Idempotent submissions (N4):
- consultation/route.ts and operations.ts now wrap the email-tracking UPDATE
in try/catch — the lead/signal is already saved, so a telemetry hiccup can't
500 the request and trigger a duplicate retry. operations.ts also returns
emailError.
Performance (N5):
- Index GlobalNode(application, isActive) — backs the case-study join on every
application page. Migration 20260609130000_index_globalnode_application.
Verified: next build compiles (Docker parity, SESSION_SECRET unset),
TypeScript clean, prisma schema valid, golden tests 17/17,
`docker compose config` valid.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
These upload directories are owned by root on the VPS (the container
entrypoint chowns mounted volumes to 1001:1001), so `git reset --hard`
failed trying to create public/team/.gitkeep as the deploy user.
Treat them like the other upload dirs (applications, cases, news, parts,
operations-inbox, footage) — gitignored, created by the Docker volume
mount at runtime. Removes public/team/.gitkeep from tracking and adds
public/team/ + public/branding/ to .gitignore.
This unblocks clean `git reset --hard origin/main` on the VPS.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Eliminates the need to run "docker compose build" after uploading
images via HQ Command. Heritage page now respects light/dark mode.
CACHE INVALIDATION
- New helper src/lib/revalidate.ts called from /api/assets and
/api/public-upload after every upload, delete, folder create
- Pages switch from force-dynamic to ISR with revalidate=60
(regenerated on demand whenever content changes, plus 60s safety)
- Nginx now sends "max-age=300, must-revalidate" instead of "expires 30d"
on /cases/, /applications/, /news/, /parts/, /footage/, /operations-inbox/
so browsers revalidate via If-Modified-Since (304s on unchanged files)
- Next.js Image Optimizer aligned with same TTL via minimumCacheTTL=300
and adds /_next/image location block in Nginx for correct headers
HERITAGE DARK/LIGHT FIX (Bug #8)
- Replaces hardcoded #0A0A0C / #00F0FF / text-white with proper
light + dark variants throughout markdown renderer (tables, lists,
headings, blockquotes, paragraphs, images)
- Hero section, navigation pill, and CMS-driven sections now switch
with the global theme toggle
SECURITY HARDENING
- Server actions bodySizeLimit reduced from 500MB to 50MB
(large uploads still go through /api/assets which uses Nginx 500MB cap)
DEPLOY NOTES
- Run on VPS:
git pull
docker compose up -d --build app
docker compose exec nginx nginx -s reload
- No DB schema changes in this commit. Existing 2FA users / data untouched.