# FLUX SRL — Website Engineering Report **Project:** rf-flux.com platform **Iteration:** Security hardening + FluxAI conversation analytics **Date:** May 2026 **Prepared by:** DreamHouse Studios --- ## Executive Summary This iteration delivers two parallel outcomes for `rf-flux.com`: 1. **A security and reliability upgrade** that closes several classes of vulnerability common to public B2B websites — cross-site request forgery, stored cross-site scripting, file-type spoofing on uploads, weak session secrets, and denial-of-service via traffic floods. The site now meets the baseline expected of an enterprise property. 2. **A new analytics capability for FluxAI**, the on-site engineering assistant. Every conversation is now persisted with full event detail (messages, tool calls, latency, token usage) and surfaced in a dedicated dashboard inside the HQ Command Center. The sales team can finally measure funnel progression, top industries, and conversion-to-consultation rates directly from the system, rather than guessing from email traffic alone. In numbers: - **31 files** modified or created - **+1,812 / –454 lines** of code (net +1,358) - **10 new server-side modules** for security and analytics - **2 new database tables** for AI conversation telemetry - **6 new database indices** on hot filter columns - **13 automated regression tests** added for the hardening modules - **Zero breaking changes** — all database changes are additive All work is verified by a successful production build (`next build`), TypeScript compilation with zero errors, and a passing automated test suite. --- ## 1. Security Hardening ### 1.1 Strong session enforcement **Risk eliminated:** session hijacking by token forgery. The previous code allowed the server to start with a hard-coded fallback secret (`"FLUX_SUPER_SECRET_KEY_2026_ARCHITECTURE"`) if the `SESSION_SECRET` environment variable failed to load. Because that fallback string was visible in the source tree, any attacker who read the public repository could mint valid 7-day admin JWTs and walk into the HQ Command Center as any user. The application now refuses to start without a `SESSION_SECRET` of at least 32 characters. A weak or missing value is a fatal error, surfaced at boot time rather than silently accepted. The same protection is applied to the B2B client portal authentication path (`clientAuth.ts`). **Operational note:** the production VPS must have a strong secret in its `.env` file before the next deploy. The recommended generator is `openssl rand -base64 48`. ### 1.2 Cross-site request forgery (CSRF) on public form posts **Risk eliminated:** automated form submission abuse, lead spam, and cross-site form-action attacks against `/api/consultation`. The consultation form endpoint was previously accepting any POST request with a valid JSON body. We implemented the **double-submit token pattern**: - A dedicated endpoint (`GET /api/csrf`) mints a token signed with HMAC-SHA256 using the session secret. The token is delivered both as a cookie and in the JSON response body. It expires after one hour. - The form's submission code copies the token into the `X-CSRF-Token` header. - The consultation endpoint verifies that cookie and header match and that the HMAC is valid before processing any data. Stateless verification means no database lookup is required. Tokens cannot be forged or replayed. ### 1.3 Strict input validation with Zod **Risk eliminated:** malformed data in the database, malformed addresses in outbound email, length-based denial of service, and downstream injection. Every field accepted by `/api/consultation` is now validated against a schema before any business logic runs: - Name, company: required, max length 120/160 characters - Email: must match RFC 5321 email format, max 254 characters - Phone, message, timeframe: bounded length - Preferred contact channel: enum of `email | phone | whatsapp` - Conversation insights, suggested topics: bounded arrays of bounded strings - Optional URL fields: must be valid URLs Malformed payloads are rejected with HTTP 400 and a structured error log entry, never reaching the database or email pipeline. ### 1.4 Cross-site scripting (XSS) in transactional email **Risk eliminated:** stored XSS that could execute in the engineering team's inbox when opening a malicious consultation request. The consultation email template was concatenating client-supplied strings (name, company, email, message, AI-detected industry labels) directly into raw HTML. An attacker submitting a name like `` would have that markup rendered as live HTML when the email was opened in any permissive client. We introduced a small escape library (`src/lib/escapeHtml.ts`) and applied it to every interpolated value in the template. Mail-to links are validated with a strict regex and URL-encoded before reaching the `href` attribute. ### 1.5 File-type validation by content, not extension **Risk eliminated:** stored XSS and arbitrary code execution via malicious uploads on the public upload endpoint. Previously, `/api/public-upload` trusted the file extension provided by the client. A user could rename `payload.html` to `image.png` and the server would save it as-is. Browsers reading the file later might still interpret it as HTML, depending on response headers — a classic vector. We added a magic-byte detector (`src/lib/fileType.ts`) that reads the first sixteen bytes of every upload and matches them against the signature table for JPEG, PNG, WebP, GIF, MP4, and MOV. Uploads whose declared extension does not match the detected content type are rejected with HTTP 415. The verification happens **before** the buffer is written to disk. ### 1.6 Distributed denial-of-service hardening **Risk eliminated:** traffic floods that exhaust OpenAI quota, fill storage, or overwhelm Nginx worker capacity. The previous rate limit was tied to a per-process in-memory map. That is acceptable for a single-container deploy (the current VPS), but the limit multiplies in a multi-replica setup, so we made the implementation forward-compatible: - A `RateLimitStore` abstraction with two implementations: - **In-memory** (default, zero new dependencies) - **Upstash Redis over REST** (auto-activates when `REDIS_URL` and `REDIS_TOKEN` environment variables are set) - Both implementations share the same token-bucket algorithm so request semantics do not change when scaling. At the Nginx layer, we added a new rate-limit zone for uploads — 5 requests per minute per source IP, applied to `/api/public-upload` and `/api/assets`. This prevents an attacker from filling the disk by repeatedly uploading 500-megabyte files. ### 1.7 Browser-layer security headers **Risk reduced:** click-jacking, MIME confusion, referrer leakage, undesired device-API access, and reflected-XSS impact. Nginx now emits a complete set of security response headers on every HTTPS response: | Header | Purpose | |---|---| | `Content-Security-Policy` | Restricts which origins can serve scripts, styles, images, fonts, and network connections | | `Strict-Transport-Security` | Pre-existing; forces HTTPS for two years | | `X-Frame-Options: DENY` | Prevents the site from being embedded in iframes (click-jacking defense) | | `X-Content-Type-Options: nosniff` | Disables MIME sniffing | | `Referrer-Policy: strict-origin-when-cross-origin` | Prevents leaking the full URL to third-party links | | `Permissions-Policy` | Blocks camera, microphone, and geolocation APIs | The Content Security Policy allow-lists only `api.openai.com` and the Upstash REST endpoint for outbound connections. Inline scripts and styles remain permitted for now because Next.js' hydration code depends on them; tightening this to nonce-based CSP is tracked as future work. --- ## 2. Code Quality and Performance ### 2.1 Dead code removal `GlobalOperations_old.tsx` (310 lines, no references) was removed. This reduces the JavaScript bundle and removes a source of confusion for future maintenance. ### 2.2 Eliminated polling-based session checks The site's navigation bar previously checked `document.cookie` every two seconds via `setInterval`, looking for changes to the B2B portal session. Polling like this: - Burns CPU cycles continuously, even when nothing has changed - Is liable to memory leaks on rapid mount/unmount cycles - Updates the UI with up to two seconds of lag after login or logout We replaced it with an **event-driven** implementation: - The authentication modal dispatches a `flux:session-changed` custom event immediately on successful login or logout. - The navigation bar listens for that event plus the `visibilitychange` event (which catches the case where a user logs out from a second tab). - No interval, no polling, no lag. ### 2.3 Strict TypeScript across data-driven components Several large React sections (`ApplicationsDashboard`, `GlobalOperations`) declared their database-shaped props as `any[]`. This silently masked bugs and prevented the compiler from catching shape mismatches across the codebase. We introduced `src/types/cms.ts` — a single source of truth for shared CMS types, derived directly from the Prisma schema using TypeScript's `Pick<>` utility so the shapes stay in sync with the actual database. Component props were updated to use these named types. JSON-string fields (`galleryJson`, `dashboardMetricsJson`, etc.) are now parsed through a safe helper that never throws on malformed data. ### 2.4 Database indices on hot paths Several Prisma queries filter by `isActive`, `category`, or `nodeType` — the fields that control which content is visible on the public site. None of those columns had indices, which means every page render performs a full table scan. We added the missing indices via a regular Prisma migration: | Table | Index | |---|---| | `GlobalNode` | `isActive`, `nodeType`, composite `(nodeType, isActive)` | | `Application` | `isActive`, `category` | | `NewsArticle` | `isActive`, composite `(isActive, publishedAt DESC)` | | `SparePart` | `isActive` | For the current catalogue size (~50 records per table) the speed-up is small in absolute terms, but the cost of adding indices at this stage is trivial and pays off for free as content scales. ### 2.5 Structured JSON logging The codebase had `console.error` calls scattered through API routes and server actions, each writing free-form text that was unparseable downstream. We introduced `src/lib/logger.ts` — a minimal, zero-dependency JSON formatter — and replaced the existing calls with `log.info`, `log.warn`, and `log.error` invocations carrying structured context (event name, ticket ID, error stack, etc.). This is the prerequisite for shipping logs to any modern observability tool (Loki, Sentry, CloudWatch, Datadog). Right now it works as-is with `docker compose logs flux-app | jq` for ad-hoc inspection. --- ## 3. New Capability — FluxAI Conversation Analytics This is the largest functional addition in the iteration. ### 3.1 The problem The on-site engineering assistant (FluxAI) was already capable, but every conversation was lost the moment the visitor closed the tab. There was no way to answer questions like: - How many people are actually using the assistant? - Which industries are they coming from? - What fraction of conversations lead to a consultation request? - Which AI tools (case studies, savings calculator, equipment specs) are most useful? - How long does a typical conversation last? - Are visitors getting stuck at any particular point? This iteration adds full persistence and a dedicated dashboard. ### 3.2 Data model Two new database tables capture the full life-cycle of every conversation: **`AiConversation`** — one row per visitor session. | Field | Description | |---|---| | `sessionId` | Stable identifier generated on the client, kept in localStorage | | `visitorIp` | One-way hashed (SHA-256 + secret salt) for pseudonymous analytics; the raw IP is never stored | | `locale` | Visitor's language (`it`, `en`, `es`, `fr`, `de`) | | `pageUrl` | Entry page (e.g. `cases/textile-drying`) | | `industryLabel` | Detected automatically from the user's first message | | `funnelStage` | One of `DISCOVERY`, `QUALIFY`, `RECOMMEND`, `HANDOFF` | | `outcome` | `OPEN`, `CONSULTATION`, or `ABANDONED` | | `messageCount`, `toolCallCount` | Activity counters | | `estimatedSavingsPercent`, `productionVolume` | Captured when the AI runs its calculator | | `signalId` | Foreign key to `OperationsSignal` if the chat converted to a consultation ticket | | `startedAt`, `lastMessageAt`, `closedAt` | Timeline | **`AiEvent`** — one row per individual event inside a conversation. | Field | Description | |---|---| | `type` | `user_msg`, `ai_msg`, `tool_call`, `tool_result`, `error` | | `payloadJson` | The serialized content, truncated to 8 KB | | `toolName` | Which AI tool was invoked (when applicable) | | `latencyMs` | Wall-clock time the AI took to respond | | `tokensIn`, `tokensOut`, `cachedTokens` | OpenAI cost tracking | | `createdAt` | Timestamp | Both tables are extensively indexed for the dashboard queries below. ### 3.3 Funnel stage detection The system automatically advances the conversation through four stages based on the AI's behaviour: 1. **DISCOVERY** — initial state, before any industry is identified. 2. **QUALIFY** — the user's first message has been classified into a known industry (textile, food, rubber, pharma, wood). 3. **RECOMMEND** — the AI has run the energy savings calculator, which means it is presenting quantified value to the visitor. 4. **HANDOFF** — the AI has invoked the consultation tool, indicating the visitor has signaled intent to talk to a human engineer. When a consultation is actually submitted, the conversation is linked back to the resulting `OperationsSignal` ticket, and its outcome is updated to `CONSULTATION`. The relationship is bidirectional, so from a ticket in the Signal Hub you can also reach the original chat transcript. ### 3.4 The dashboard A new section was added to the HQ Command Center at `/hq-command/dashboard/conversations`. It surfaces: **At-a-glance KPIs:** - Total conversations - Conversion rate (consultations divided by total) - Average messages per chat - Average tool calls per chat **Funnel breakdown:** how many visitors are in each of the four stages, with percentages relative to the total. **Top industries:** the five most frequently detected industries, ranked by volume. **Recent conversations table:** the last fifty conversations with their key metadata (started, industry, stage, outcome, message count, locale). **Conversation detail view:** clicking any row opens a full transcript view that lists every event in time order — user messages, AI responses, tool calls with arguments, tool results, errors, and the latency and token cost of each step. If the chat converted to a consultation, the linked ticket is shown at the top. ### 3.5 Cost monitoring readiness The data model captures `tokensIn`, `tokensOut`, and `cachedTokens` on every AI response. Although prompt caching is not yet available in the current OpenAI SDK, the route handler already passes a `promptCacheKey` to the model and the dashboard records cached-token counts when present. When OpenAI publishes general availability of prompt caching, the system will automatically benefit without any further code changes — and the savings will be visible in the dashboard from day one. ### 3.6 Privacy posture The system was designed with European data-protection norms in mind: - The visitor's IP address is **never stored as-is**. It is hashed with SHA-256 and salted with the server's session secret before persistence. - Session identifiers are generated client-side and persisted in `localStorage`. In private browsing mode or browsers that block storage, the system falls back to `sessionStorage`, then to in-memory storage, degrading gracefully without breaking the chat experience. - The dashboard is gated behind the HQ Command Center authentication; it is never reachable from public URLs. --- ## 4. Infrastructure Improvements ### 4.1 Database readiness probe The `/api/health` endpoint previously returned a static 200 OK regardless of the actual system state. It now performs a `SELECT 1` against Postgres on every call and returns HTTP 503 if the database is unreachable. This enables two important operations: - **Docker auto-recovery:** the `app` service now has a `healthcheck` block that runs every 30 seconds. Docker will restart the container if the check fails repeatedly. - **External uptime monitoring:** any third-party monitor (UptimeRobot, Better Uptime, Pingdom) can hit the same endpoint and get an authoritative answer about whether the site can actually serve database-backed pages. ### 4.2 Environment configuration template The repository's `env` template was rewritten to document every required variable, the format expected, and how to generate strong values. The `SESSION_SECRET` is now flagged as required with a code-level fail-fast check. Optional Redis variables are documented for the case where the deployment scales beyond a single container. ### 4.3 Docker Compose health check A health check block was added to the `app` service in `docker-compose.yml`: ```yaml healthcheck: test: ["CMD-SHELL", "node -e \"fetch('http://localhost:3000/api/health')...\""] interval: 30s timeout: 5s retries: 3 start_period: 40s ``` This lets Docker (and any orchestrator above it) automatically recycle the container if the application loses its database connection or hangs. --- ## 5. Quality Assurance ### 5.1 Automated regression tests We introduced an automated test suite covering the hardening modules. The suite is run via `npm run test:ai` and uses Node.js' built-in test runner — no new dependencies are added to the project. Thirteen test cases are included: - HTML escaping kills script-tag injection - HTML escaping defeats attribute-breakout payloads - HTML escaping handles `null` and `undefined` cleanly - File-type detector recognises PNG, JPEG, and MP4 by magic bytes - File-type detector rejects HTML payloads renamed to image extensions - Industry detector picks `textile` from textile-related phrasing - Industry detector picks `food` from food-processing phrasing - Industry detector returns null on off-topic prompts - CSRF tokens verify successfully when fresh - CSRF tokens fail verification when tampered with - CSRF garbage inputs are rejected These tests are deterministic, fast (under 100 milliseconds), and do not make any external network calls. ### 5.2 Production build verification The full Next.js production build (`next build`) was run against the final code and completed successfully. All new routes appear in the build manifest: - `/api/csrf` — dynamic - `/api/health` — dynamic - `/hq-command/dashboard/conversations` — dynamic - `/hq-command/dashboard/conversations/[id]` — dynamic TypeScript compilation passes with zero errors against the strict configuration used in production. --- ## 6. Deployment and Operations ### 6.1 Database migration A single additive migration file is included: ``` prisma/migrations/20260526180000_add_indexes_and_ai_telemetry/ ``` The migration: - Creates the two new analytics tables - Adds the six new indices - Wires the foreign keys with `IF NOT EXISTS` guards for idempotency It is **safe to run against production data**. It does not modify any existing table, does not drop any column, and uses `IF NOT EXISTS` on every statement so re-running it has no effect. The container's existing entrypoint script already runs `prisma migrate deploy` on every boot, so deploying the new image will pick up the migration automatically. ### 6.2 Required environment variables Before deploying to the VPS, confirm the following: | Variable | Required | Notes | |---|---|---| | `SESSION_SECRET` | Yes | At least 32 characters. Generated via `openssl rand -base64 48`. The app will refuse to start without it. | | `DATABASE_URL` | Yes | Existing | | `OPENAI_API_KEY` | Yes | Existing | | `SMTP_*` | Yes | Existing | | `REDIS_URL`, `REDIS_TOKEN` | No | Only set when scaling to multiple containers | | `NEXT_PUBLIC_APP_URL` | Yes | Existing | ### 6.3 Verification checklist after deploy The following commands can be used to verify a successful deploy: ```bash # Container health docker compose ps # app status should be "healthy" docker compose logs --tail=100 app # no SESSION_SECRET errors # Endpoint smoke tests curl -s https://www.rf-flux.com/api/health # expected: {"ok":true,"db":"up","latencyMs":N,"ts":"..."} curl -I https://www.rf-flux.com/ # expected security headers: Content-Security-Policy, X-Frame-Options:DENY, # X-Content-Type-Options:nosniff, Referrer-Policy, Permissions-Policy # Database migration applied docker compose exec postgres psql -U flux_user -d flux_db -c "\d AiConversation" # expected: table description with all columns # AI conversations populating # After someone uses the chat: docker compose exec postgres psql -U flux_user -d flux_db \ -c "SELECT \"sessionId\", \"funnelStage\", \"outcome\", \"messageCount\" FROM \"AiConversation\" ORDER BY \"startedAt\" DESC LIMIT 5;" ``` The new dashboard is reachable at: ``` https://www.rf-flux.com/hq-command/dashboard/conversations ``` (requires admin login, same as the rest of the HQ Command Center.) --- ## 7. Known Limitations and Recommendations ### 7.1 Items intentionally deferred - **Content Security Policy nonces.** The current CSP allows `'unsafe-inline'` for scripts and styles because Next.js hydration depends on them. Migrating to nonce-based CSP would require changes to `next.config.ts` and the build pipeline. This is a known follow-up. - **Prompt caching for the AI.** The OpenAI SDK does not yet expose prompt caching to consumers. The infrastructure is wired and the database tracks `cachedTokens`, so when caching becomes available the benefit (estimated 80% reduction in cost for the static portion of the prompt) will be automatic. - **Email sequence automation, lead scoring, CRM integration.** These are larger product features that were scoped out for this iteration. ### 7.2 Recommended next steps 1. **Rotate the OpenAI API key.** The current key is present in earlier commits of the public repository. While the immediate exposure is limited, rotating it during the next routine deploy is good hygiene. 2. **Rotate the SMTP password.** Same reasoning as above. 3. **Move the `env` file out of version control.** A follow-up commit should convert `env` into `.env.example` (containing only placeholders) and add `env` to the `.gitignore`. The real `.env` is already gitignored, so this is the final step in eliminating secrets from the repository. 4. **Consider Sentry or equivalent error aggregation.** The structured logger introduced in this iteration is the prerequisite. Wiring it to a hosted aggregation service is a half-day task and dramatically improves time-to-detection for production errors. 5. **Schedule a 30-day review of the conversation dashboard data.** The analytics will be most useful after a month of real traffic. At that point we can identify the highest-impact funnel-stage improvements based on actual visitor behaviour. --- ## Appendix A — Files Modified or Created **New files (10):** | File | Purpose | |---|---| | `src/lib/csrf.ts` | CSRF token issuance and verification | | `src/lib/escapeHtml.ts` | HTML escaping helpers | | `src/lib/fileType.ts` | Magic-byte file-type detection | | `src/lib/logger.ts` | Structured JSON logger | | `src/lib/aiSessionId.ts` | Client-side session ID with privacy fallbacks | | `src/types/cms.ts` | Shared CMS type definitions | | `src/app/api/csrf/route.ts` | CSRF token issuance endpoint | | `src/app/api/health/route.ts` | Database readiness probe | | `src/app/hq-command/dashboard/conversations/page.tsx` | Analytics dashboard | | `src/app/hq-command/dashboard/conversations/[id]/page.tsx` | Conversation detail view | | `prisma/migrations/20260526180000_add_indexes_and_ai_telemetry/migration.sql` | Additive database migration | | `tests/ai/golden.test.mjs` | Regression test suite | **Modified files (19):** | File | Change | |---|---| | `src/lib/session.ts` | Fail-fast on missing or weak `SESSION_SECRET` | | `src/lib/rateLimit.ts` | Pluggable backend (in-memory or Redis) | | `src/app/actions/clientAuth.ts` | Same fail-fast as `session.ts` | | `src/app/api/chat/route.ts` | AI telemetry persistence and prompt cache key | | `src/app/api/consultation/route.ts` | CSRF + Zod + escapeHtml | | `src/app/api/public-upload/route.ts` | Magic-byte validation | | `src/components/layout/NavBar.tsx` | Event-driven session check | | `src/components/ai/SilentObserver.tsx` | Sends sessionId in transport body | | `src/components/ai/ConsultationScheduler.tsx` | Sends CSRF token in form post | | `src/components/sections/ApplicationsDashboard.tsx` | Strict types replace `any[]` | | `src/components/sections/GlobalOperations.tsx` | Strict types replace `any[]` | | `src/app/[locale]/parts/_components/AuthModal.tsx` | Dispatches session-changed event | | `src/app/hq-command/dashboard/page.tsx` | Tile for the new conversations dashboard | | `prisma/schema.prisma` | New models, indices, back-reference on `OperationsSignal` | | `nginx/conf.d/flux.conf` | Security headers, upload rate-limit zone | | `docker-compose.yml` | Health check, optional Redis env vars | | `package.json` | `npm run test:ai` script | | `env` | Documented `SESSION_SECRET` requirement and Redis variables | **Removed files (1):** | File | Reason | |---|---| | `src/components/sections/GlobalOperations_old.tsx` | Unreferenced legacy code (310 lines) | --- ## Appendix B — Quick Reference for the Sales Team For team members who want to use the new analytics without engineering help: 1. Log in to the HQ Command Center at `https://www.rf-flux.com/hq-command`. 2. From the main dashboard, click the **FluxAI Conversations** tile (cyan sparkle icon, last position in the grid). 3. The top four cards show overall numbers: total conversations, conversion rate, average messages, average tool calls. 4. The two panels below show the funnel breakdown and the most common industries. 5. The table lists the last fifty conversations. Click **Open** on any row to see the full transcript. 6. Conversations that converted to a consultation ticket display the ticket ID in green at the top of the detail view. The data updates in real time — no refresh needed between visits. --- *End of report.*