fix(security+db): close the real audit findings (SEC-04/05/01, DB-01)
Deploy to VPS / deploy (push) Has been cancelled

Acts on the verified findings from the 2026-06 audit (docs/AUDIT_2026-06_
VERIFIED.md). The audit's #1 "middleware never runs" was a false positive
(verified in prod: /hq-command redirects to login). These are the genuine
gaps:

- SEC-04 (HIGH): /api/assets (GET/POST/PUT/DELETE/PATCH) and
  /api/branding/favicon (POST) had NO auth. The middleware matcher excludes
  /api, so they were world-reachable — anyone could list/upload/rename/
  delete CMS files or regenerate the favicon. Added a new getAdminSession()
  helper (src/lib/session.ts) and a requireAdmin() guard on every handler.

- DB-01 (HIGH): the ClientUser table (B2B client portal) was defined in the
  schema but NEVER created by any migration, and OperationsSignal.clientId +
  its FK were missing too. B2B register/login failed at runtime; the
  dashboard silently showed 0 clients. New additive migration
  20260609120000_add_client_user creates the table, the unique email index,
  the clientId column (IF NOT EXISTS), and the FK (duplicate-object guarded).

- SEC-05 (MED-HIGH): operations.ts generateRichEmailHtml() interpolated
  item.title/sku/quantity, clientName/Company/Email/Phone and the free-text
  message straight into HTML — stored XSS into the team's internal inbox.
  Now escaped via escapeHtml/escapeAttr/safeMailto; file links validated to
  internal paths only.

- SEC-01 (MED): removed the hardcoded SESSION_SECRET fallback in src/proxy.ts;
  it now validates lazily and throws if the secret is missing (mirrors
  session.ts), so a runtime env failure can't fall back to a public key.

Verified: next build compiles with SESSION_SECRET unset (Docker parity),
TypeScript clean, prisma schema valid, golden tests 17/17.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-09 22:40:20 -05:00
parent b76c14b780
commit 18d5ed87c8
7 changed files with 562 additions and 17 deletions
+458
View File
@@ -0,0 +1,458 @@
# FLUX SRL — Audit Verification & Corrected Priorities
**Date:** 2026-06-09
**Method:** 10-dimension multi-agent audit (59 agents, adversarial verification) + manual re-verification of every critical/high finding against the running production system.
> **Why this file exists:** the automated audit (full report below) is thorough but produced one **false-positive CRITICAL** and overstated a few others. This top section is the corrected, ground-truth verdict. Trust this section over the raw report where they disagree.
---
## Manual verification results (the corrections)
### ❌ SEC-02 "Auth middleware never runs" — FALSE POSITIVE
The audit's #1 critical claimed `src/proxy.ts` is never executed because it exports `proxy` not `middleware`.
**Verified in production:** `curl https://www.rf-flux.com/hq-command/dashboard` (no cookie) → **307 redirect to `/hq-command/login`**. The middleware **is running**. Next.js 16 *does* recognize `src/proxy.ts` exporting `proxy` (the rename is real in Next 16; the code comment was correct). The empty `middleware-manifest.json` in a local build is misleading — runtime behavior disproves the claim.
**Consequence:** the HQ admin surface **is** protected by the middleware. SEC-03 (below) drops from critical to low.
### ⚠️ SEC-03 "HQ server actions lack auth" — OVERSTATED (low, not critical)
Because the middleware *does* run and its matcher covers `/hq-command/*`, POSTs to those routes (which is how server actions are invoked) are gated — no cookie → redirect before the action executes. Adding an in-action `getSession()` check is still good defense-in-depth, but it is **not** an open door.
### ✅ SEC-04 "/api/assets + /api/branding/favicon unauthenticated" — REAL (HIGH) — the actual top security issue
Verified: neither route checks a session, and the middleware matcher **excludes `/api`**, so the middleware does not protect them. Anyone can `GET/POST/PUT/DELETE` `/api/assets` — list CMS structure, upload files into public scopes, rename/delete content — and POST `/api/branding/favicon`. **This is the real #1.**
### ✅ DB-01 "ClientUser table never migrated" — REAL (HIGH), impact nuanced
Verified: the init migration creates 10 tables; **`ClientUser` is not among them**, and no later migration adds it. Code calls `prisma.clientUser` in `clientAuth.ts` (B2B register/login) and the dashboard.
**Real impact:** the **B2B client portal (register/login) is broken** at runtime. The HQ dashboard does *not* crash (its counts run inside a try/catch and silently show 0 clients), so the audit's "undeployable crash" framing was overstated — but the B2B portal genuinely doesn't work.
### ✅ SEC-05 "Operations email not HTML-escaped" — REAL (MEDIUM-HIGH)
Verified: `src/app/actions/operations.ts → generateRichEmailHtml()` interpolates `item.title`, `payload.clientName`, `clientCompany`, etc. straight into HTML with no `escapeHtml()`. Stored-XSS into the team's internal operations inbox from the CartDrawer form. Real.
### ✅ SEC-01 "Hardcoded secret fallback in proxy" — REAL but MITIGATED (MEDIUM)
`src/proxy.ts:13` still has `|| "FLUX_SUPER_SECRET_KEY_2026_ARCHITECTURE"`. Mitigated because signing (`session.ts`) throws without the env var, so tokens can't be forged in normal operation. Should still be removed (the verifier asymmetry is real).
### ✅ INFRA-03 "Secrets in git" — REAL (known)
OpenAI key + Gmail app password are in git history. Rotate; convert tracked `env``.env.example`.
---
## Corrected priority list — what actually matters
### 🔴 NOW (real, small, do first)
1. **Auth on `/api/assets` + `/api/branding/favicon`** (SEC-04) — add `getAdminSession()` guard at the top of each handler. *Effort: S*
2. **Create the `ClientUser` migration** (DB-01) — additive migration; unblocks the B2B portal. *Effort: S*
3. **Escape HTML in operations emails** (SEC-05) — reuse `escapeHtml` like consultation/route already does. *Effort: S*
4. **Remove the proxy secret fallback** (SEC-01) — throw if `SESSION_SECRET` missing, mirroring `session.ts`. *Effort: S*
5. **Whitelist chat context fields** (AI-01) — sanitize `context.section/activeTab` before injecting into the system prompt (prompt-injection hygiene). *Effort: S*
6. **Rotate OpenAI key + SMTP password; untrack `env`** (INFRA-03). *Effort: S — needs the client for the key*
### 🟡 NEXT (operational resilience — all real)
- Automated nightly **DB backups** (none today) + offsite. *M*
- **Container memory/CPU limits** + Nginx healthcheck in compose. *S*
- `buildSystemPrompt()` **try/catch + static fallback** so chat degrades instead of 500 if DB is down (TEST-02). *S*
- **Validate `OPENAI_API_KEY`** at startup + wrap `streamText` in try/catch (TEST-03). *S*
- **Idempotent consultation** submission (no orphan records on SMTP fail) (TEST-04). *M*
- **Monitoring/uptime** (UptimeRobot on `/api/health`) + log aggregation. *M*
- Index `GlobalNode.application`; cache the system prompt (PERF-01/03). *S*
### 🟢 LATER (polish/scale)
- Defense-in-depth `getSession()` inside HQ actions (SEC-03). a11y pass (focus trap, labels, skip link, reduced-motion). i18n: number formatting + translate error boundaries. SEO: image/video sitemaps, FAQ/VideoObject schema. Reduce `any`, split oversized files, adopt `log.*` everywhere. Jsonb migration for hot JSON fields. Lazy-load Three.js.
---
## What's genuinely strong (verified)
CSRF on public endpoints · magic-byte upload validation · JWT signing that throws on missing secret · **the middleware DOES protect HQ** · complete 5-locale parity · the AI translation glossary · hreflang/canonical/breadcrumb SEO · Nginx edge caching + the new canonical-host guard + scanner blocking · structured logger + safe-JSON helpers + escapeHtml exist (gap is adoption consistency) · clean idempotent migrations · bcrypt password hashing · well-designed 9-tool SPIN FluxAI.
---
## Dimension scorecard (audit, with my adjustments)
| Dimension | Audit | Adjusted | Note |
|---|---|---|---|
| Security | 3.5 | **6.5** | SEC-02 false positive removed; middleware protects HQ. Real gaps: SEC-04/05/01. |
| Performance | 6.0 | 6.0 | Solid baseline; prompt rebuilt per message. |
| Code quality | 5.5 | 5.5 | `any` usage, logging inconsistency. |
| Database | 4.5 | **5.5** | ClientUser migration missing (B2B), but not a dashboard crash. |
| FluxAI | 6.0 | 6.0 | Strong tools; add prompt-injection guard + history. |
| SEO | 6.5 | 6.5 | Good base; add media sitemaps + FAQ/Video schema. |
| i18n | 6.0 | 6.5 | Excellent infra; few hardcoded strings. |
| Infra | 4.0 | 4.5 | No backups/monitoring; secrets in git. The real weak spot. |
| Accessibility/UX | 4.5 | 5.0 | a11y gaps; dark mode now covered. |
| Testing/reliability | 3.0 | 3.5 | One suite; add API integration tests + degradation. |
**Adjusted overall: ~6.0/10** (audit said 4.8 — the false-positive critical dragged it down). The build is solid; the genuine priorities are a handful of small auth/data fixes plus operational resilience (backups + monitoring).
---
---
# Appendix — Full automated audit report (raw, uncorrected)
> The section below is the unedited 10-dimension report. Where it conflicts with the verification above (notably SEC-02), the verification above is authoritative.
# FLUX SRL — Consolidated Audit Report
**Project:** rf-flux.com (Flux SRL) — Next.js 16 · Prisma 7 · PostgreSQL · AI SDK 6 · 5 locales · Docker + Nginx on OVH
**Audit scope:** 10 dimensions, adversarially verified (each critical/high finding independently re-checked against source)
**Date:** 2026-06-09
**Auditor:** Lead consolidation pass
---
## 1. Executive Summary
Flux SRL is a capably-built marketing + B2B platform with genuinely good bones — multi-stage Docker, CSRF on public endpoints, magic-byte upload validation, a structured logger, a working AI translation glossary, complete 5-locale message parity, and a recently-hardened Nginx (canonical-host guard, scanner blocking). But the audit surfaced a **cluster of authorization failures that, taken together, mean the entire HQ admin surface is currently unprotected**, plus operational gaps (no backups, no monitoring, secrets in git) that make this risky to run in production as-is.
The single most important finding: the auth middleware **does not execute at all** because the file exports a function named `proxy` instead of `middleware`. Every downstream auth assumption collapses from that one fact — and because the HQ server actions and the `/api/assets` + `/api/branding` routes have *no* internal session checks, they are reachable by anyone.
### Overall health: **4.8 / 10** (weighted toward the security + infra blockers)
### Dimension scorecard
| Dimension | Score | One-line verdict |
|---|---|---|
| Security | 3.5 | Middleware never runs; HQ actions + asset APIs fully unauthenticated. |
| Performance | 6.0 | Solid caching baseline; chat rebuilds prompt + 4 DB queries every message. |
| Code quality | 5.5 | 187 `any` usages, inconsistent logging, empty catch blocks; good helpers exist but under-used. |
| Database | 4.5 | `ClientUser` table referenced in code but never migrated — runtime crash risk. |
| FluxAI | 6.0 | Strong tool architecture; prompt injection via context, no history, plaintext message storage. |
| SEO | 6.5 | Good metadata + JSON-LD; no image/video sitemaps for a visual industrial site. |
| i18n | 6.0 | Excellent infra and parity; a few components ship hardcoded English. |
| Infra | 4.0 | No resource limits, no backups, no monitoring, secrets in tracked `env` file. |
| Accessibility/UX | 4.5 | No skip link, no Escape-to-close, no reduced-motion, broken Tailwind class. |
| Testing/reliability | 3.0 | One unit-test file; zero integration tests; multiple silent-fail paths. |
### Top 5 things to fix first
1. **Make the middleware actually run** (SEC-02). Rename/export `proxy` as `middleware` in `src/middleware.ts`. Until this lands, *nothing else in the auth layer matters* — the page-level guards are dead code.
2. **Add session checks inside HQ server actions and the asset/branding APIs** (SEC-03, SEC-04). Defense-in-depth: even with middleware fixed, these must self-verify. They are directly POST-able today.
3. **Rotate the leaked credentials and stop tracking `env`** (INFRA-03). The OpenAI key and Gmail app password are in git history. Rotate now; they're compromised regardless of what you do next.
4. **Create the missing `ClientUser` migration** (DB-01). Code calls `prisma.clientUser` in the dashboard and B2B auth; the table was never created. This is an active crash, not a hypothetical.
5. **Stand up backups + container resource limits** (INFRA-05, INFRA-01). No automated DB backup and no memory caps on an OVH single-VPS is a "lose everything on one bad day" setup.
---
## 2. Critical & High Findings (confirmed / partially-confirmed only)
> Severities below use the **adjusted** verdict from verification. Refuted findings are excluded (see §6).
### Security
**SEC-02 — Auth middleware never executes (CRITICAL)**
`src/proxy.ts:17` — The file exports `async function proxy(...)` plus `config`, but Next.js 16 only invokes a file named `middleware.ts` exporting `middleware`. The `.next/server/middleware-manifest.json` is empty, confirming it was never compiled. A code comment even documents the rename intentionally. **Impact:** every `/hq-command/*` route has zero middleware protection; direct navigation and prefetch hit the dashboard with no auth gate. **Fix:** create `src/middleware.ts` with `export { proxy as middleware }` and re-export `config`; verify the matcher covers `/hq-command`. Then redeploy and confirm the manifest is populated.
**SEC-03 — HQ server actions lack session verification (CRITICAL)**
`src/app/hq-command/dashboard/inbox/actions.ts:11-188``getSignals()`, `getClients()`, `approveAccessRequest()`, `updateSignalStatus()`, `resolveAndCleanSignal()`, `deleteSignal()`, `resendSignalEmail()`, `deleteClient()` run raw Prisma mutations with no `getSession()` call. Marked `"use server"` but self-unauthenticated. **Impact:** these are invokable directly via POST (curl/fetch) — approve/delete B2B clients, mutate signals, trigger emails, all without a token. **Fix:** add a `requireAdminSession()` helper and call it at the top of every action: `const s = await getAdminSession(); if (!s) throw new Error('Unauthorized');`. Don't rely on the middleware alone.
**SEC-04 — `/api/assets` and `/api/branding/favicon` are unauthenticated (CRITICAL, raised from high)**
`src/app/api/assets/route.ts`, `src/app/api/branding/favicon/route.ts` — All methods (GET/POST/PUT/DELETE/PATCH) on `/api/assets` and POST on the favicon route have no session check. Path sanitization prevents traversal but not auth. **Impact:** anyone can enumerate CMS structure, upload SVG-with-JS into public scopes, delete branding/content, or regenerate favicons → defacement/XSS/downtime. **Fix:** at the top of each handler, `const s = await getAdminSession(); if (!s) return NextResponse.json({error:'Unauthorized'},{status:401});`.
**SEC-01 — Hardcoded session-secret fallback in proxy (HIGH, lowered from critical)**
`src/proxy.ts:13``process.env.SESSION_SECRET || "FLUX_SUPER_SECRET_KEY_2026_ARCHITECTURE"`, used for JWT *verification* at line 35. Verifier confirmed it's mitigated in practice: `session.ts`/`clientAuth.ts` *throw* if the secret is unset at signing time, so tokens can't be forged under normal operation. The danger is the asymmetry — if env loading ever fails at runtime, the verifier would accept tokens signed with the public string. **Fix:** remove the fallback; throw loudly if `SESSION_SECRET` is missing, mirroring `session.ts`.
**SEC-05 — Missing HTML escaping in operations email templates (HIGH)**
`src/app/actions/operations.ts:119-156``generateRichEmailHtml()` interpolates `item.title`, `item.sku`, `payload.clientName`, `clientCompany`, `clientEmail`, `clientPhone`, `message` straight into HTML. `escapeHtml()` exists in `src/lib/escapeHtml.ts` but isn't imported here. **Impact:** HTML/script injection into internal operations emails from the CartDrawer form. **Fix:** escape every user field before interpolation, matching the pattern already correct in `src/app/api/consultation/route.ts`.
### Database
**DB-01 — `ClientUser` table referenced but never migrated (CRITICAL)**
`prisma/schema.prisma:370-387` defines `ClientUser` and the `OperationsSignal.clientId` relation (lines 226-227), but **no migration creates the table** — not the init migration nor any of the 5 later ones. Code actively calls it: `src/app/hq-command/dashboard/page.tsx:67-68` (`prisma.clientUser.count()`), `src/app/actions/clientAuth.ts:25,30,50+` (findUnique/create/update). **Impact:** immediate runtime failure on any B2B auth or dashboard-count path; undeployable. **Fix:** generate a migration creating `ClientUser` (id, email unique, passwordHash, fullName, companyName, phone?, isApproved default false, lastLoginAt?, createdAt, updatedAt) and the FK from `OperationsSignal.clientId`. Run `prisma migrate dev` and verify against a fresh DB.
**DB-04 — JSON stored as `String` instead of `Jsonb` (HIGH)**
`prisma/schema.prisma` — 24+ JSON fields across 12 models (`translationsJson`, `galleryJson`, `sectionsJson`, `payloadJson`, etc.) are `String`. **Impact:** no DB-level JSON queries/indexing/validation; all parse/serialize is manual (and is the root cause of several CQ-03/CQ-04 parse-error findings). **Fix:** migrate query-hot fields first (`payloadJson` on AiEvent, `galleryJson`/`sectionsJson` on Application) to `Jsonb` via add-column → backfill → drop → rename.
**DB-05 — AI telemetry grows unbounded; no retention policy (HIGH)**
`prisma/schema.prisma:322-365``AiConversation`/`AiEvent` persist every chat indefinitely. No TTL, cleanup job, or purge anywhere in `src/`. Stores hashed IP + userAgent + full message text → GDPR Art. 5(1)(e) storage-limitation exposure. **Fix:** add a `retentionDays` setting (default 90), a daily cleanup deleting rows older than the window, and document it. Consider archival before delete.
**DB-03 — `GlobalNode.application` has no referential integrity (MEDIUM, lowered from high)**
`prisma/schema.prisma:40` — plain `String`, validated only as non-empty in `network/actions.ts:34`. Verifier downgraded: orphaned nodes persist and are simply excluded from filtered queries (deterministic, not data loss); UI dropdown prevents most bad input. **Fix:** validate `Application.findUnique({where:{slug}})` exists before create, or migrate to an `applicationId` FK.
**DB-02 — `PageContent` missing `createdAt` (MEDIUM, lowered from high)**
`prisma/schema.prisma:266` — has `updatedAt` only; every other model has both (except `SiteSetting`). **Impact:** no creation audit trail for page content. **Fix:** add `createdAt DateTime @default(now())`.
### Infrastructure
**INFRA-01 — No container resource limits (CRITICAL)**
`docker-compose.yml:41-116` — neither `app` (with `restart: always`) nor `nginx` nor postgres define memory/CPU limits. **Impact:** a runaway/leaking process consumes all VPS RAM and OOM-kills siblings → documented downtime. **Fix:** `app` mem_limit 1G (reservation 512M), nginx 256M/128M, cpus 1.0 each; load-test.
**INFRA-03 — Secrets committed in tracked `env` file (CRITICAL)**
`env:29-40` — real `OPENAI_API_KEY` (sk-proj-…), `SMTP_USER`, and a 16-char Gmail app password, tracked since first commit `fc24313`. `.gitignore` pattern `.env*` doesn't match the bare `env` filename. **Fix (in order):** (1) **rotate the OpenAI key and Gmail app password now** — they're already compromised; (2) `git rm --cached env`, add `env` to `.gitignore`; (3) commit `.env.example` with placeholders; (4) move to GitHub Secrets / a secrets manager. History scrub is optional but the rotation is not.
**INFRA-05 — No automated database backups (CRITICAL)**
`src/app/hq-command/dashboard/health/actions.ts:31-52` — backup is a manual HTTP export of unencrypted JSON covering only 10 of 16 tables (omits HeroSlide, SiteSetting, AiConversation, AiEvent, ClientUser, TeamMember). No cron, no offsite, no retention, no WAL/PITR. **Impact:** total loss if the PG volume corrupts. **Fix:** nightly `pg_dump | gzip` to offsite (S3, encrypted), 30-day retention; enable WAL archiving for PITR; document + test RTO/RPO monthly.
**INFRA-02 — Nginx has no healthcheck; unconditional `depends_on` (HIGH, lowered from critical)**
`docker-compose.yml:93-116` — nginx lacks a healthcheck and depends on `app` with no `condition`, while `app` correctly uses `condition: service_healthy` for postgres. Verifier softened the "deadlock" claim: it's a startup race, not a hang — nginx proxies to the unready app and returns 502/503 during the ~40s start window. **Fix:** add an nginx healthcheck (tcp:80) and `depends_on: app: condition: service_healthy`.
**INFRA-04 — Deploy script has no error handling or rollback (HIGH, lowered from critical)**
`.github/workflows/deploy.yml:28-51` — the health check at line 49 uses `curl -sf … || echo`, swallowing failure so `script_stop` is bypassed; migrations (line 42) run post-live with no error handling; no snapshot, no rollback. **Impact:** a failed migration or post-deploy crash leaves prod broken with manual-only recovery. **Fix:** make the health check fail the deploy (`curl -sf … || exit 1`, retried 3× with backoff); take a pre-deploy snapshot; restore on failure.
**INFRA-06 — No monitoring, alerting, or log aggregation (HIGH, lowered from critical)**
`docker-compose.yml` — structured logger + `/api/health` exist, but nothing ships metrics/logs to any platform (no Prometheus/Grafana/Sentry/Loki). **Impact:** blind operations; degradation invisible until users complain. **Fix:** expose `/metrics` (prom-client), ship structured logs to Loki/CloudWatch, alert on mem >80% / error-rate >1% / p99 >500ms.
**INFRA-07 — Missing OCSP stapling (HIGH)**
`nginx/conf.d/flux.conf:68-82` — no `ssl_stapling`. **Impact:** slower TLS handshakes; visitor IPs leak to Let's Encrypt. **Fix:** add `ssl_stapling on; ssl_stapling_verify on; ssl_trusted_certificate …/chain.pem; resolver 1.1.1.1 8.8.8.8 valid=300s;`.
**INFRA-11 — CSP allows `unsafe-inline` + `unsafe-eval` (MEDIUM, lowered from high)**
`nginx/conf.d/flux.conf:87` — present in `script-src` for Next.js hydration. Verifier noted strong compensating controls (Zod validation, `escapeHtml`, magic-byte checks, minimal `dangerouslySetInnerHTML`), so it's real technical debt rather than an active hole. **Fix:** move to nonce-based CSP; run `Report-Only` first.
**INFRA-08 — Nginx cache-poisoning surface via hidden `Set-Cookie` (MEDIUM, lowered from high)**
`nginx/conf.d/flux.conf:282-286``proxy_ignore_headers`/`proxy_hide_header Set-Cookie` on public pages. Verifier refuted the "session cookies lost" and "sensitive header disclosure" parts (authenticated requests bypass cache via `$cookie_flux_session`); the residual risk is poisoning public pages with a non-sensitive injected cookie. **Fix:** refine cache-bypass logic rather than blanket-hiding `Set-Cookie`.
**INFRA-09 — Large `client_max_body_size` without explicit body timeout (MEDIUM, lowered from high)**
`nginx/conf.d/flux.conf:71,135,148` — 500M limit; global block lacks `client_body_timeout`. Verifier softened: nginx's 60s default applies and upload endpoints set proxy timeouts. **Fix:** add `client_body_timeout 60s;` to the http block; reduce 500M where not genuinely needed.
**INFRA-10 — No logrotate (MEDIUM, lowered from high)**
`nginx/nginx.conf:39` — logs unbounded, but verifier noted `/var/log/nginx` isn't a mounted volume, so logs live in ephemeral container storage (purged on restart). Residual risk: log loss on restart and container-layer growth on long uptimes. **Fix:** add logrotate (daily, compress, keep 7, SIGHUP).
### Performance
**PERF-01 — System prompt rebuilt with 4 DB queries on every chat request (HIGH)**
`src/app/api/chat/route.ts:42-281``buildSystemPrompt()` runs `application.findMany` + 2× `globalNode.count` + `sparePart.count` (Promise.all) on every POST, with no caching layer (the `promptCacheKey` is a no-op). **Impact:** 4 round-trips per message; 400+/min at 100 concurrent users. **Fix:** cache the built prompt in-memory with a 3060s TTL, invalidated on CMS write.
**PERF-03 — Missing index on `GlobalNode.application` (HIGH)**
`prisma/schema.prisma` — queries filter on `(application, isActive)`, `(nodeType, isActive, application)`, and `(application, nodeType, isActive)` (applications page 136-142; chat 503, 621) but there's no `application` index. **Impact:** full table scans on the synchronous FluxAI search path. **Fix:** add `@@index([application])`, `@@index([application, isActive])`, `@@index([application, nodeType])`; migrate.
**PERF-02 — Synchronous `fs.readdirSync` in SSR paths (MEDIUM, lowered from high)**
`src/app/[locale]/page.tsx:102`, `src/app/[locale]/applications/[slug]/page.tsx:32` — real blocking calls, but verifier downgraded: they're fallback paths inside try/catch on small dirs, hit during 60s ISR regen, not per-request. **Fix:** switch to `fs.promises.readdir`, or remove the fallback once HeroSlide/CMS migration is complete.
### FluxAI
**AI-01 — Prompt injection via unvalidated context fields (HIGH)**
`src/app/api/chat/route.ts:215-216``context.section` and `context.activeTab` come from `req.json()` (TS types only, no runtime validation) and are interpolated into `contextNote`, concatenated to the system prompt at line 287. A manipulated frontend store can inject instructions overriding FluxAI's personality/tool limits. **Fix:** whitelist `context.section`/`activeTab` against the known section set; reject anything else (Zod).
**AI-03 — User message text stored unencrypted (HIGH)**
`src/app/api/chat/route.ts:268` — full user text (≤8000 chars) persisted as plaintext in `AiEvent.payloadJson` (`String`). Customer questions about volumes/processes sit in cleartext. **Impact:** breach exposes competitive intel; GDPR Art. 32 (encryption at rest). **Fix:** encrypt sensitive payloads (pgcrypto/KMS) or store only industry label + tool names; pair with the DB-05 retention policy.
**AI-02 — No conversation history / resume (HIGH)**
`src/components/ai/SilentObserver.tsx:76-78``useChat` initializes fresh with no `initialMessages`; backend persists everything but the UI never fetches it. **Impact:** multi-turn B2B sales context lost on refresh, undermining the funnel-tracking investment. **Fix:** add `/api/chat/history?sessionId`, hydrate `initialMessages` on mount, offer "continue previous conversation?".
**AI-06 — Telemetry write errors silently swallowed (HIGH)**
`src/app/api/chat/route.ts:272-274, 370-371` — telemetry writes are wrapped in try/catch with only `log.warn()`; the `onFinish` writes can fail after the response streamed, losing conversation records. **Impact:** undercounted funnel analytics. **Fix:** circuit-breaker + alert on >5 failures/hour; buffer-and-retry unsent events.
### SEO
**SEO-02 — No image or video sitemaps (HIGH)**
`src/app/sitemap.ts:1-106` — only page URLs. Rich media (news `coverImage`/`galleryJson`, application galleries, heritage `mediaUrl` videos) is invisible to Google Image/Video Search; `robots.ts` declares only `/sitemap.xml`. **Fix:** add `sitemap-images.xml` (`<image:image>` per article cover/gallery + app hero) and `sitemap-videos.xml` (heritage videos with title/description/duration); declare both in robots.
### i18n
**I18N-01 — Hardcoded English in EnergySavingsCalculator (HIGH)**
`src/components/ai/EnergySavingsCalculator.tsx:61,118,146-149,156,164-178` — 13+ literal strings ("Annual Savings", "CO2 Reduced", "Payback", "Request Detailed Engineering Study", …); component never imports `useTranslations`, and no `EnergySavingsCalculator` namespace exists in any of the 5 locale files. **Impact:** IT/ES/DE/VEC users see English in a customer-facing calculator. **Fix:** add the namespace to `en.json`, translate to all 4 locales, wire `useTranslations("EnergySavingsCalculator")`.
**I18N-02 — Hardcoded alert in CartDrawer (HIGH)**
`src/components/layout/CartDrawer.tsx:43``alert("You must accept the privacy policy.")` bypasses the otherwise-used translation system; a modal blocker shown in English to all locales. **Fix:** move to a translated key and prefer the Toast component over native `alert()`.
### Accessibility / UX
**A11Y-02 — Broken Tailwind class breaks nav styling (HIGH)**
`src/components/layout/NavBar.tsx:190``"text-[#86868B hover:text-[#1D1D1F]"` is missing a `]`. **Impact:** the color class doesn't apply to inactive nav links in light mode. **Fix:** `"text-[#86868B] hover:text-[#1D1D1F]"`.
**A11Y-03 — Icon-only buttons missing `aria-label` (HIGH)**
`src/components/layout/NavBar.tsx:229,278,294` — theme toggle, cart, mobile-menu announce only "button". **Fix:** add descriptive `aria-label`s (e.g., `aria-label={isDark ? 'Switch to light mode' : 'Switch to dark mode'}`).
**A11Y-04 — Modal can't be closed with Escape (HIGH)**
`src/components/ui/CaseStudyModal.tsx:232-237` — no keydown handler; only a click-able close button (WCAG 2.1.1). **Fix:** `useEffect` keydown listener → `if (e.key==='Escape') onClose()`.
**A11Y-05 — No skip-to-main-content link (HIGH)**
`src/app/[locale]/layout.tsx:145-204` — ~13 interactive elements before main; main div has no `id` and isn't a `<main>` (WCAG 2.4.1). **Fix:** add `<a href="#main-content" class="sr-only focus:not-sr-only">Skip to main content</a>` and an `id`/`<main>` target.
**A11Y-08 — No `prefers-reduced-motion` handling (HIGH)**
`src/app/globals.css` — 260+ framer-motion animations, zero reduced-motion checks. **Impact:** vestibular-disorder risk. **Fix:** add `@media (prefers-reduced-motion: reduce){*{animation:none!important;transition:none!important}}` and gate heavy motion via `useReducedMotion()`.
**A11Y-09 — Hero videos lack captions (MEDIUM, lowered from high)**
`src/components/sections/HeroReel.tsx:71-83` — no `<track>`; verifier noted videos are muted/decorative with overlaid text, softening impact (WCAG 1.2.2). **Fix:** add captions where videos carry meaning, or mark decorative explicitly.
**A11Y-01 — Empty `alt` on cart product images (MEDIUM, lowered from high)**
`src/components/layout/CartDrawer.tsx:148``alt=""`; verifier noted adjacent visible title+SKU text mitigates. **Fix:** `alt={item.title}` for defense-in-depth.
### Code quality
**CQ-01 — 187 `any` usages across 48 files (HIGH)**
Concentrated in `ApplicationClient.tsx` (1208 lines), `chat/route.ts` (920 lines), `SilentObserver.tsx` (22 instances), despite `strict: true`. **Fix:** use existing `cms.ts` types (`AppFull`, `NodeFull`); add a `model-viewer.d.ts` for the web-component casts; chip away by file.
**CQ-02 — Inconsistent logging: 50 `console.error` vs 9 `log.error` (HIGH)**
A structured JSON logger exists (`src/lib/logger.ts`) but is bypassed in `sitemap.ts:75,102`, `heritage/page.tsx:210`, `mailer.ts:96`, `imageOptimizer.ts:117`, etc. **Impact:** unstructured output breaks Loki/CloudWatch parsing. **Fix:** replace `console.*` with `log.*` in server code; add a lint rule / pre-commit hook.
**CQ-04 — Empty catch blocks suppress JSON parse errors (HIGH)**
`ApplicationClient.tsx:618,826-829`, `news/[slug]/page.tsx:259`, `parts/page.tsx:55` — silent `catch(e){}` on dimensions/media/specs parsing. **Fix:** at minimum `catch(e){ log.warn('parse_failed', e) }`.
**CQ-03 — Unguarded `JSON.parse` for sections/advantages (MEDIUM, lowered from high)**
`ApplicationClient.tsx:1026-1027``data.sectionsJson`/`advantagesJson` parsed with no try/catch (verifier confirmed *this* pair; refuted the claims about lines 600/609/CaseStudyModal which do have guards). **Fix:** use the existing `parseJsonField` helper from `cms.ts`.
**CQ-07 — `any`-typed Prisma access in chat (MEDIUM, lowered from high)**
`chat/route.ts:413-414``(app: any)` access with inconsistent null-handling; verifier refuted the line-450 "undefined .score" claim (it's always initialized). Real issue is type-safety, not a live crash. **Fix:** use Prisma `select` with explicit types instead of `any`.
### Testing / reliability
**TEST-01 — No integration/e2e tests for critical API routes (HIGH)**
Only `tests/ai/golden.test.mjs` (17 unit tests). Zero coverage of `/api/consultation`, `/api/chat`, `/api/health`, `/api/public-upload`. **Fix:** add integration tests (Node test runner) with mocked Prisma/nodemailer; aim >80% on `src/app/api`.
**TEST-02 — `buildSystemPrompt()` has no DB-failure handling (HIGH)**
`chat/route.ts:42-72` — 4 parallel Prisma queries, no try/catch; called at line 281 before `streamText()`. If Postgres is down, the whole chat 500s instead of degrading. Asymmetric with the telemetry try/catch. **Fix:** wrap in try/catch → fall back to a static `DEFAULT_SYSTEM_PROMPT` (keeps personality/tools, omits dynamic counts).
**TEST-03 — OpenAI key never validated at startup or in health check (HIGH)**
`chat/route.ts`, `health/route.ts``openai('gpt-4o')` is called with no key validation; `/api/health` only does `SELECT 1`. **Impact:** invalid/missing key surfaces mid-stream after headers are sent. **Fix:** validate `OPENAI_API_KEY` at startup; wrap `streamText()` in try/catch returning a structured error (with `retryAfterSec` on 429).
**TEST-04 — Consultation email isn't idempotent; orphaned records on SMTP failure (HIGH)**
`api/consultation/route.ts:80-155` — signal created (line 105) before email send (line 141); on SMTP failure the client gets a 500 with no `ticketId`, retries duplicate, and the fallback route isn't re-attempted. **Fix:** add an idempotency key, return 202 with `ticketId` + `emailError` on send failure, and add a background retry with backoff.
---
## 3. Medium / Low / Info
**Security**
- SEC-06 (med) — `operations.ts:129-132`: file links built from unvalidated `fileUrl` → validate it starts with `/`.
- SEC-07 (low, downgraded) — `parts/page.tsx`: client-side gate not SSR redirect; data isn't leaked → optional `redirect()` for cleanliness.
- SEC-08 (med) — HQ actions have no rate limiting → rate-limit by admin id + action.
- SEC-09 (low) — `rateLimit.ts:144`: trusts `X-Forwarded-For` unvalidated → validate single-IP / trust only Nginx.
- SEC-10 (low) — `chat/route.ts:225`: `SESSION_SECRET` reused as telemetry HMAC salt → use a separate `VISITOR_HASH_SECRET`.
**Database**
- DB-06 (med) — `schema.prisma:230`: `AiConversation→OperationsSignal` no `onDelete` → orphans; document or cascade.
- DB-07 (med) — add `@@index([type, status, createdAt(sort: Desc)])` on OperationsSignal.
- DB-08 (low) — Application.order non-unique → validate uniqueness per `isActive` group.
- DB-09 (low) — NewsArticle.publishedAt has no future-date guard → filter `publishedAt <= now()`.
**Performance**
- PERF-04 (med) — `chat/route.ts:519`: JS keyword filter post-query → use Prisma `contains`/`mode:'insensitive'`.
- PERF-05 (med) — lazy-load `GlobalOperations` (Three.js ~300KB) via `dynamic({ssr:false})`.
- PERF-06 (med) — 48 `use client` components; push pure-display ones back to server components.
- PERF-07 / INFRA-12 (med) — Prisma pool `max=10` hardcoded → make `DB_POOL_MAX` env-configurable (2030 prod).
- PERF-08/10 (low) — ISR `revalidate=60` × 40 renders/min → consider 300600s or on-demand revalidation.
- PERF-09 (med) — large Docker build context → audit with `--progress=plain`, exclude heavy media.
- INFRA-13 (med) — Nginx upstream `keepalive=32` → raise to 128256.
**FluxAI**
- AI-04 (med) — no golden eval for tool selection → build 2030 query eval set, run monthly.
- AI-05 (low) — `stepCountIs(5)` may truncate → bump to 78, monitor traces.
- AI-07 (low) — no per-tool cost tracking → log `(conversationId, toolName, tokensIn/Out)`.
- AI-08 (med) — `SilentObserver.tsx:232,237`: tool failures silently omitted → render "no results" card.
- AI-09 (low) — duplicate of PERF-01 (prompt caching).
- AI-10 (med) — `useChat` has no `onError` → add error state + Retry button.
**SEO**
- SEO-01 (med) — OG images lack width/height/type → add `{width:1200,height:630,type:'image/jpeg'}`.
- SEO-03 (med) — no FAQPage schema → add `faqPageSchema()`.
- SEO-04 (med) — no VideoObject schema on heritage videos → add `videoObjectSchema()`.
- SEO-05/06 (low) — Product schema lacks AggregateRating/Offer; Article lacks keywords/commentCount.
- SEO-07 (med) — markdown parser allows empty `alt` → fallback `alt || 'Article image'` + CMS validation.
- SEO-08 (low) — add breadcrumb schema to home/team/heritage/news-hub.
- SEO-09 (low) — LocalBusiness hours lack timezone → add `Europe/Rome`.
- SEO-10 (med) — GlobalNode case studies lack structured data → add `caseStudySchema()`.
- SEO-11 (low) — add Twitter `creator`/`site` handles.
- SEO-12 (info) — no Core Web Vitals monitoring → add `web-vitals` + dashboard.
**i18n**
- I18N-03 (med) — error boundaries hardcoded English → translate `[locale]/error.tsx` at minimum.
- I18N-04 (med) — `formatNumber()` calls `toLocaleString()` without locale → pass active locale / `Intl.NumberFormat`.
- I18N-05 (med) — register `EnergySavingsCalculator` namespace in all 5 locale files (pairs with I18N-01).
- I18N-08 (low) — `getLocalizedData` fallback is English-only → optional locale-chain fallback.
**Accessibility / UX**
- A11Y-06 (med) — `ConsultationScheduler.tsx:394`: errors lack `aria-live`/`role=alert`.
- A11Y-07 (med) — form inputs lack `<label htmlFor>` associations.
- A11Y-10 (med) — 71 buttons default to `type=submit` → add `type=button` to non-submit buttons.
- A11Y-11 (med) — AI chat modal has no focus trap.
- A11Y-12 (med) — language dropdown lacks arrow-key navigation.
- A11Y-13 (med) — opacity-reduced nav text may fail 4.5:1 contrast → test + use solid colors.
- A11Y-14 (med) — success message lacks `role=status`/`aria-live`.
- A11Y-15 (low) — verify dark-mode coverage on newer pages + HQ panel.
**Code quality**
- CQ-05 (med) — oversized files (ApplicationClient 1208, chat/route 920, AssetBucketBrowser 874) → extract sub-components/tools.
- CQ-06 (med) — model-viewer `as any` casts → proper `.d.ts` type.
- CQ-08 (med) — inconsistent error-handling pattern across layers → standardize log-then-return/throw.
- CQ-09 (low) — `privacy/page.tsx:12`: hardcoded `privacy@rf-flux.com` (TODO unconfirmed) → env/SiteSettings + confirm with FLUX legal.
- CQ-10 (low) — repeated `as unknown as` casts on AI SDK responses → define interfaces.
- CQ-11 (low) — `i18nHelper.ts:27`: `console.error``log.error`.
**Infra / testing**
- INFRA-14 (med) — backup/restore lacks HMAC integrity → sign exports, verify on restore.
- INFRA-15 (med) — deploy doesn't verify commit signatures → `git verify-commit HEAD`.
- TEST-05 (med) — no i18n parity/fallback tests.
- TEST-06 (med) — error boundaries log to console, not the structured logger.
- TEST-07 (med) — server-component DB errors bubble ungracefully → per-query try/catch + degraded UI.
- TEST-08 (med) — no retry/backoff for transient email/Prisma/OpenAI failures.
- TEST-09 (med) — in-memory rate limit multiplies across replicas → require Redis for multi-instance.
- TEST-10 (med) — health check doesn't verify migrations/SMTP/OpenAI/env.
- TEST-11/12/13 (low) — telemetry truncation unlogged; no polyglot-file upload tests; `restoreDatabase()` partial-failure handling.
---
## 4. What's Already Strong
Credit where due — several things are done well and should be preserved:
- **Public-endpoint security baseline.** CSRF is correctly implemented on public endpoints, rate limiting covers chat, file uploads use magic-byte validation, and the consultation route already escapes HTML correctly (the model SEC-05 should copy). Recent commits added a **canonical-host guard and scanner-probe blocking** in Nginx — good hardening.
- **JWT signing discipline.** `session.ts`/`clientAuth.ts` correctly *throw* when `SESSION_SECRET` is missing rather than falling back — this is what saved SEC-01 from being critical.
- **i18n infrastructure.** Complete top-level key parity across all 5 locales (18 sections), correct next-intl integration, preserved `{count}`/`{app}` placeholders, and a well-built **AI translation glossary** that masks/restores protected brand/technical terms (FLUX, Radio Frequency, solid-state, RF, MHz/kHz/GHz, kW/kWh/MW). Pluralization (`componentFound`/`componentsFound`) is handled correctly.
- **SEO foundations.** Correct hreflang alternates, canonical URLs, breadcrumb JSON-LD on detail pages, and core schemas (Organization, LocalBusiness, WebSite, Article, Product). robots.txt disallow rules are correct; sitemap covers all public routes.
- **Caching & DB pooling.** Nginx edge caching, ISR, image optimization, and a real Prisma connection pool are all in place — a solid performance baseline.
- **Architecture awareness.** A structured JSON logger, safe-JSON helpers (`parseJsonField`), `escapeHtml`, and proper `cms.ts` types all exist — the gaps are *adoption consistency*, not missing infrastructure. bcrypt is used for password hashing; migrations are clean and idempotent (`IF NOT EXISTS` guards). FluxAI's 9-tool SPIN-funnel architecture is well-designed. Semantic HTML (`main`/`nav`/`header`/`footer`) and dark-mode coverage are largely in place.
---
## 5. Prioritized Remediation Roadmap
### NOW — blockers; do before any further production traffic
| Item | Finding | Effort |
|---|---|---|
| Rename/export middleware so auth actually runs | SEC-02 | **S** |
| Add session checks inside HQ server actions | SEC-03 | **M** |
| Add auth to `/api/assets` + `/api/branding` | SEC-04 | **S** |
| Rotate leaked OpenAI key + Gmail password; untrack `env` | INFRA-03 | **S** |
| Create the missing `ClientUser` migration | DB-01 | **S** |
| Escape HTML in operations email templates | SEC-05 | **S** |
| Whitelist chat context fields (prompt injection) | AI-01 | **S** |
| Remove the proxy session-secret fallback | SEC-01 | **S** |
### NEXT — operational resilience + data integrity
| Item | Finding | Effort |
|---|---|---|
| Automated nightly offsite encrypted backups + WAL/PITR | INFRA-05 | **M** |
| Container memory/CPU limits | INFRA-01 | **S** |
| Nginx healthcheck + `condition: service_healthy` | INFRA-02 | **S** |
| Deploy: failing health check + snapshot/rollback | INFRA-04 | **M** |
| AI telemetry retention policy + cleanup job | DB-05 | **M** |
| Encrypt / minimize stored chat message text | AI-03 | **M** |
| `buildSystemPrompt()` try/catch + fallback prompt | TEST-02 | **S** |
| Validate OpenAI key at startup + wrap `streamText` | TEST-03 | **S** |
| Idempotent consultation submission | TEST-04 | **M** |
| Cache system prompt (TTL) | PERF-01 | **S** |
| Add `GlobalNode.application` indexes | PERF-03 | **S** |
| Integration tests for the 4 critical API routes | TEST-01 | **L** |
| Monitoring + log aggregation + alerts | INFRA-06 | **M** |
| Fix `aria-label`s, skip link, Escape-to-close, reduced-motion, broken nav class | A11Y-02/03/04/05/08 | **M** |
| i18n: EnergySavingsCalculator + CartDrawer alert | I18N-01/02/05 | **M** |
### LATER — hardening, polish, scale
| Item | Finding | Effort |
|---|---|---|
| Migrate hot JSON `String` fields → `Jsonb` | DB-04 | **L** |
| Nonce-based CSP (drop unsafe-inline/eval) | INFRA-11 | **M** |
| OCSP stapling, logrotate, body timeout, keepalive tuning | INFRA-07/09/10/13 | **S** |
| Image + video sitemaps; VideoObject/FAQ/CaseStudy schema | SEO-02/03/04/10 | **M** |
| Reduce `any` usage; extract oversized files; standardize logging | CQ-01/02/04/05 | **L** |
| Lazy-load Three.js; trim `use client`; configurable pool | PERF-05/06/07 | **M** |
| Conversation history/resume; tool eval set; useChat onError | AI-02/04/10 | **M** |
| Remaining a11y (focus trap, labels, contrast, button types) | A11Y-06/07/10/11/12/13/14 | **M** |
| Locale-aware number formatting; translate error boundaries | I18N-03/04 | **S** |
---
## 6. False Positives Considered (refuted / materially downgraded)
These were checked and either refuted or substantially softened during verification — listed so you know they were examined, not missed:
- **SEC-07 (B2B `/parts` not redirecting)** — *refuted as a security issue.* The page returns 200 but never exposes parts data (empty array), and shows a proper "Access Restricted" locked-state UI. A style/pattern preference, not a leak. Downgraded high → low.
- **CQ-07 line-450 "undefined `.score`"** — *refuted.* The `scored` array always initializes `score` (line 415, returned 443); `.score` can't be undefined there. The surrounding `any`-typing concern remains valid; the crash claim does not.
- **CQ-03 (lines 600/609 + CaseStudyModal "silent failure")** — *partially refuted.* Lines 600/609 are inside a try/catch with `.w/.h/.d` validation, and CaseStudyModal:189-197 has real try/catch. Only the `sectionsJson`/`advantagesJson` parse at lines 1026-1027 is genuinely unguarded.
- **INFRA-08 (cache poisoning)** — *partially refuted.* "Session cookies lost for users" and "sensitive header disclosure" are false (authenticated requests bypass cache via `$cookie_flux_session`). Residual risk is limited to poisoning public pages with non-sensitive cookies.
- **INFRA-02 / INFRA-04 (nginx + deploy)** — *softened, not refuted.* The "deadlock"/"no-recovery" framing was overstated (startup race + manual recovery, not hangs); both remain real high-severity gaps.
- **INFRA-09 / INFRA-10 (body timeout / logrotate)** — *softened.* nginx's 60s default timeout and ephemeral (unmounted) log storage reduce the blast radius; both downgraded high → medium.
- **DB-02 / DB-03 / PERF-02 / A11Y-01 / A11Y-09 / INFRA-11** — *confirmed but downgraded* (see §2) where verification found mitigating context (other models' parity, dropdown-constrained input, ISR-only timing, adjacent visible text, muted decorative video, compensating input-validation controls).
---
*Bottom line, David: the build quality is real, but the auth layer is currently a no-op and the platform has no backups, no monitoring, and live secrets in git. The "NOW" block is mostly small, surgical changes — SEC-02 alone is a one-file fix that re-activates a security layer you already wrote. Land that block first, then the operational resilience in "NEXT," and this moves from "risky to run" to "production-ready" quickly.*