flux-srl

davidherran/flux-srl

Fork 0

Commit Graph

Author	SHA1	Message	Date
davidherran	a81ee50ed8	feat(resilience): operational hardening (NEXT phase of the audit) Deploy to VPS / deploy (push) Has been cancelled Details Acts on the audit's NEXT block — operational resilience. Backups (N1): - New `backup` compose service (postgres:16-alpine) runs scripts/backup-loop.sh: immediate pg_dump on start, then nightly, gzip, 14-day rotation into ./backups on the host. Configurable via BACKUP_RETENTION_DAYS / BACKUP_INTERVAL_SECONDS. (Offsite copy is the documented next step.) Resource limits + healthchecks (N2): - deploy.resources.limits.memory on postgres (2g), app (1500m), nginx (256m), backup (256m) so no container can starve the others (the Nginx outage was a reminder). - Nginx now has a healthcheck hitting a new self-served `/nginx-health` endpoint on the default_server (no upstream dependency). Chat resilience (N3): - buildSystemPrompt() wraps its 4 Prisma queries in try/catch with safe defaults — if Postgres is down the assistant degrades instead of 500-ing. - Result is cached for 60s (only on healthy builds) so we don't run 4 queries per message; CMS edits still appear within the TTL. - POST fails fast with 503 if OPENAI_API_KEY is missing (instead of breaking mid-stream after headers are sent). - streamText gets an onError handler that logs + persists an `error` AiEvent. Idempotent submissions (N4): - consultation/route.ts and operations.ts now wrap the email-tracking UPDATE in try/catch — the lead/signal is already saved, so a telemetry hiccup can't 500 the request and trigger a duplicate retry. operations.ts also returns emailError. Performance (N5): - Index GlobalNode(application, isActive) — backs the case-study join on every application page. Migration 20260609130000_index_globalnode_application. Verified: next build compiles (Docker parity, SESSION_SECRET unset), TypeScript clean, prisma schema valid, golden tests 17/17, `docker compose config` valid. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 23:07:38 -05:00
davidherran	9b28f8ffaf	fix: nextjs primary group + auto-create asset folders on entity create Deploy to VPS / deploy (push) Has been cancelled Details THREE INTERLOCKING FIXES so editors stop hitting permission walls. 1) DOCKERFILE — gid 65533 (nogroup) on uploaded files The container was creating files as 1001:65533 because Alpine's `adduser --system --uid 1001 nextjs` doesn't set a primary group. Files written through /api/assets ended up with `nogroup` ownership, which surprised host sysadmins and made `chown -R 1001:1001` revert on each fresh container start. Fix: `adduser --system --uid 1001 --ingroup nodejs nextjs`. Now every file written by the container is 1001:1001 (nextjs:nodejs), matching the host conventions and the existing chown automation. 2) ENTRYPOINT — recursively normalise existing files The recursive chown in scripts/docker-entrypoint.sh now sweeps every subfolder of /app/public/branding\|footage\|applications\|cases\|news\| parts\|operations-inbox\|heritage on each container start, fixing any files that previously slipped through with the wrong group. Single fast pass, idempotent. Adds /app/public/heritage to the list (was missing). 3) AUTO-CREATE ASSET BUCKETS on entity create The big editor UX win: when an admin creates a Case (GlobalNode), an Application or a News article in HQ Command, the server now also mkdir's the well-known asset subfolders for that entity. So after creating "Acme Industries" as a case, the editor immediately gets /public/cases/acme-industries/{videos,renders,gallery,datasheet,models} ready — no more "EACCES because the dir wasn't created" gotcha when they upload their first video. Implementation: - src/lib/assetFolders.ts: typed helper with per-scope bucket lists + a titleToSlug helper that mirrors the front-end's slugger so the folder name matches what ApplicationClient expects when rendering /cases/<slug>/videos/<file>. - network/actions.ts: createNode -> ensureAssetFolders("cases", slug). Plus a new server action ensureNodeAssetFolders(id) so the editor can fix existing nodes without recreating them (one-click "Repair"). - news/actions.ts: createNewsArticle -> ensureAssetFolders("news",slug) - applications/actions.ts: createApplication -> ensureAssetFolders(...) DEPLOY (David) cd /opt/flux-srl git pull docker compose up -d --build app # The entrypoint will fix existing 1001:65533 files automatically # as the container boots — no manual chown needed.	2026-05-05 08:01:45 -05:00
davidherran	ba002ea9e6	fix: auto-chown mounted volumes + metadataBase warning Deploy to VPS / deploy (push) Has been cancelled Details THREE FIXES IN ONE SHOT. 1. UPLOAD EACCES (the crashing one) The /app/public/branding upload was failing with EACCES because the folder on the host was created by `debian` (uid 1000) but the container runs as nextjs (uid 1001). Docker bind mounts preserve host ownership, so the container couldn't write into branding/. Fix: introduce a docker-entrypoint.sh that runs the container briefly as root, chowns every public/* mount to uid 1001, runs Prisma migrate deploy, then drops to nextjs via `su-exec`. From now on every deploy self-heals permissions across all asset folders (branding, footage, applications, cases, news, parts, operations-inbox) — even if a future volume gets added with the wrong owner. Dockerfile changes: - Adds `su-exec` package (lightweight gosu equivalent for Alpine) - Removes the static USER directive (entrypoint manages user transitions) - Replaces CMD with an ENTRYPOINT pointing at the new script 2. metadataBase WARNING Server logs were emitting: ⚠ metadataBase property in metadata export is not set ... using "http://localhost:3000" That's the layout's generateMetadata not declaring metadataBase, so Next.js couldn't resolve relative OG/Twitter image URLs to absolute ones. Reading NEXT_PUBLIC_APP_URL (already set in docker-compose env) and feeding it as `metadataBase: new URL(...)` silences the warning and produces correct absolute URLs in social previews. 3. PERMISSIONS DOCS The entrypoint chown is idempotent and silent on non-existent folders, so future volumes added to docker-compose just work. No more "did you sudo chown the new folder" gotchas. DEPLOY (David) cd /opt/flux-srl # one-time fix for the existing branding folder so the next deploy # doesn't have to chown 65MB of data — but the entrypoint now handles # this automatically anyway: sudo chown -R 1001:1001 /opt/flux-srl/public/branding git pull docker compose up -d --build app	2026-05-04 18:17:39 -05:00

Author

SHA1

Message

Date

davidherran

a81ee50ed8

feat(resilience): operational hardening (NEXT phase of the audit)

Deploy to VPS / deploy (push) Has been cancelled

Details

Acts on the audit's NEXT block — operational resilience.

Backups (N1):
- New `backup` compose service (postgres:16-alpine) runs scripts/backup-loop.sh:
  immediate pg_dump on start, then nightly, gzip, 14-day rotation into
  ./backups on the host. Configurable via BACKUP_RETENTION_DAYS /
  BACKUP_INTERVAL_SECONDS. (Offsite copy is the documented next step.)

Resource limits + healthchecks (N2):
- deploy.resources.limits.memory on postgres (2g), app (1500m), nginx (256m),
  backup (256m) so no container can starve the others (the Nginx outage was a
  reminder).
- Nginx now has a healthcheck hitting a new self-served `/nginx-health`
  endpoint on the default_server (no upstream dependency).

Chat resilience (N3):
- buildSystemPrompt() wraps its 4 Prisma queries in try/catch with safe
  defaults — if Postgres is down the assistant degrades instead of 500-ing.
- Result is cached for 60s (only on healthy builds) so we don't run 4 queries
  per message; CMS edits still appear within the TTL.
- POST fails fast with 503 if OPENAI_API_KEY is missing (instead of breaking
  mid-stream after headers are sent).
- streamText gets an onError handler that logs + persists an `error` AiEvent.

Idempotent submissions (N4):
- consultation/route.ts and operations.ts now wrap the email-tracking UPDATE
  in try/catch — the lead/signal is already saved, so a telemetry hiccup can't
  500 the request and trigger a duplicate retry. operations.ts also returns
  emailError.

Performance (N5):
- Index GlobalNode(application, isActive) — backs the case-study join on every
  application page. Migration 20260609130000_index_globalnode_application.

Verified: next build compiles (Docker parity, SESSION_SECRET unset),
TypeScript clean, prisma schema valid, golden tests 17/17,
`docker compose config` valid.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-09 23:07:38 -05:00

davidherran

9b28f8ffaf

fix: nextjs primary group + auto-create asset folders on entity create

Deploy to VPS / deploy (push) Has been cancelled

Details

THREE INTERLOCKING FIXES so editors stop hitting permission walls.

1) DOCKERFILE — gid 65533 (nogroup) on uploaded files
The container was creating files as 1001:65533 because Alpine's
`adduser --system --uid 1001 nextjs` doesn't set a primary group.
Files written through /api/assets ended up with `nogroup` ownership,
which surprised host sysadmins and made `chown -R 1001:1001` revert
on each fresh container start.

Fix: `adduser --system --uid 1001 --ingroup nodejs nextjs`. Now
every file written by the container is 1001:1001 (nextjs:nodejs),
matching the host conventions and the existing chown automation.

2) ENTRYPOINT — recursively normalise existing files
The recursive chown in scripts/docker-entrypoint.sh now sweeps every
subfolder of /app/public/branding|footage|applications|cases|news|
parts|operations-inbox|heritage on each container start, fixing any
files that previously slipped through with the wrong group. Single
fast pass, idempotent. Adds /app/public/heritage to the list (was
missing).

3) AUTO-CREATE ASSET BUCKETS on entity create
The big editor UX win: when an admin creates a Case (GlobalNode), an
Application or a News article in HQ Command, the server now also
mkdir's the well-known asset subfolders for that entity. So after
creating "Acme Industries" as a case, the editor immediately gets
/public/cases/acme-industries/{videos,renders,gallery,datasheet,models}
ready — no more "EACCES because the dir wasn't created" gotcha
when they upload their first video.

Implementation:
- src/lib/assetFolders.ts: typed helper with per-scope bucket lists
  + a titleToSlug helper that mirrors the front-end's slugger so the
  folder name matches what ApplicationClient expects when rendering
  /cases/<slug>/videos/<file>.
- network/actions.ts: createNode -> ensureAssetFolders("cases", slug).
  Plus a new server action ensureNodeAssetFolders(id) so the editor
  can fix existing nodes without recreating them (one-click "Repair").
- news/actions.ts: createNewsArticle -> ensureAssetFolders("news",slug)
- applications/actions.ts: createApplication -> ensureAssetFolders(...)

DEPLOY (David)
  cd /opt/flux-srl
  git pull
  docker compose up -d --build app
  # The entrypoint will fix existing 1001:65533 files automatically
  # as the container boots — no manual chown needed.

2026-05-05 08:01:45 -05:00

davidherran

ba002ea9e6

fix: auto-chown mounted volumes + metadataBase warning

Deploy to VPS / deploy (push) Has been cancelled

Details

THREE FIXES IN ONE SHOT.

1. UPLOAD EACCES (the crashing one)
The /app/public/branding upload was failing with EACCES because the
folder on the host was created by `debian` (uid 1000) but the container
runs as nextjs (uid 1001). Docker bind mounts preserve host ownership,
so the container couldn't write into branding/.

Fix: introduce a docker-entrypoint.sh that runs the container briefly
as root, chowns every public/* mount to uid 1001, runs Prisma migrate
deploy, then drops to nextjs via `su-exec`. From now on every deploy
self-heals permissions across all asset folders (branding, footage,
applications, cases, news, parts, operations-inbox) — even if a future
volume gets added with the wrong owner.

Dockerfile changes:
- Adds `su-exec` package (lightweight gosu equivalent for Alpine)
- Removes the static USER directive (entrypoint manages user transitions)
- Replaces CMD with an ENTRYPOINT pointing at the new script

2. metadataBase WARNING
Server logs were emitting:
  ⚠ metadataBase property in metadata export is not set ... using "http://localhost:3000"
That's the layout's generateMetadata not declaring metadataBase, so
Next.js couldn't resolve relative OG/Twitter image URLs to absolute
ones. Reading NEXT_PUBLIC_APP_URL (already set in docker-compose env)
and feeding it as `metadataBase: new URL(...)` silences the warning
and produces correct absolute URLs in social previews.

3. PERMISSIONS DOCS
The entrypoint chown is idempotent and silent on non-existent folders,
so future volumes added to docker-compose just work. No more "did you
sudo chown the new folder" gotchas.

DEPLOY (David)
  cd /opt/flux-srl
  # one-time fix for the existing branding folder so the next deploy
  # doesn't have to chown 65MB of data — but the entrypoint now handles
  # this automatically anyway:
  sudo chown -R 1001:1001 /opt/flux-srl/public/branding
  git pull
  docker compose up -d --build app

2026-05-04 18:17:39 -05:00

3 Commits