Files

T

davidherran 1ee8288c7e feat(analytics): GA4 with GDPR Consent Mode v2

Google Analytics integration, off by default and GDPR-compliant for EU:

- src/lib/analytics/gtag.ts: typed event helpers + consent control. Every
  function is a safe no-op when NEXT_PUBLIC_GA_ID is unset.
- GoogleAnalytics.tsx: loads gtag.js with Consent Mode v2, all storage
  defaulting to "denied". anonymize_ip on, send_page_view off.
- ConsentBanner.tsx: on-brand cookie banner, localized to all 5 locales,
  persists choice for one year, flips analytics_storage to granted on accept.
- PageViewTracker.tsx: fires page_view on App Router client navigation
  (inside Suspense for useSearchParams).
- Key conversion events wired: ai_consultation_submitted (primary funnel
  goal) and ai_chat_opened.
- Consent strings added to messages/{en,it,vec,es,de}.json.

Build plumbing:
- NEXT_PUBLIC_GA_ID inlined at build time via Dockerfile ARG +
  docker-compose build.args (NEXT_PUBLIC_* must exist during next build,
  not just runtime).
- Nginx CSP extended to allow googletagmanager.com + google-analytics.com.
- env template documents NEXT_PUBLIC_GA_ID (empty = analytics disabled).

Verified: production build inlines the Measurement ID into the client
bundle; site builds cleanly both with and without the ID set.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-02 06:53:04 -05:00

26 KiB

Raw Blame History

FLUX SRL — Website Engineering Report

Project: rf-flux.com platform Iteration: Security hardening + FluxAI conversation analytics Date: May 2026 Prepared by: DreamHouse Studios

Executive Summary

This iteration delivers two parallel outcomes for rf-flux.com:

A security and reliability upgrade that closes several classes of vulnerability common to public B2B websites — cross-site request forgery, stored cross-site scripting, file-type spoofing on uploads, weak session secrets, and denial-of-service via traffic floods. The site now meets the baseline expected of an enterprise property.
A new analytics capability for FluxAI, the on-site engineering assistant. Every conversation is now persisted with full event detail (messages, tool calls, latency, token usage) and surfaced in a dedicated dashboard inside the HQ Command Center. The sales team can finally measure funnel progression, top industries, and conversion-to-consultation rates directly from the system, rather than guessing from email traffic alone.

In numbers:

31 files modified or created
+1,812 / –454 lines of code (net +1,358)
10 new server-side modules for security and analytics
2 new database tables for AI conversation telemetry
6 new database indices on hot filter columns
13 automated regression tests added for the hardening modules
Zero breaking changes — all database changes are additive

All work is verified by a successful production build (next build), TypeScript compilation with zero errors, and a passing automated test suite.

1. Security Hardening

1.1 Strong session enforcement

Risk eliminated: session hijacking by token forgery.

The previous code allowed the server to start with a hard-coded fallback secret ("FLUX_SUPER_SECRET_KEY_2026_ARCHITECTURE") if the SESSION_SECRET environment variable failed to load. Because that fallback string was visible in the source tree, any attacker who read the public repository could mint valid 7-day admin JWTs and walk into the HQ Command Center as any user.

The application now refuses to start without a SESSION_SECRET of at least 32 characters. A weak or missing value is a fatal error, surfaced at boot time rather than silently accepted. The same protection is applied to the B2B client portal authentication path (clientAuth.ts).

Operational note: the production VPS must have a strong secret in its .env file before the next deploy. The recommended generator is openssl rand -base64 48.

1.2 Cross-site request forgery (CSRF) on public form posts

Risk eliminated: automated form submission abuse, lead spam, and cross-site form-action attacks against /api/consultation.

The consultation form endpoint was previously accepting any POST request with a valid JSON body. We implemented the double-submit token pattern:

A dedicated endpoint (GET /api/csrf) mints a token signed with HMAC-SHA256 using the session secret. The token is delivered both as a cookie and in the JSON response body. It expires after one hour.
The form's submission code copies the token into the X-CSRF-Token header.
The consultation endpoint verifies that cookie and header match and that the HMAC is valid before processing any data.

Stateless verification means no database lookup is required. Tokens cannot be forged or replayed.

1.3 Strict input validation with Zod

Risk eliminated: malformed data in the database, malformed addresses in outbound email, length-based denial of service, and downstream injection.

Every field accepted by /api/consultation is now validated against a schema before any business logic runs:

Name, company: required, max length 120/160 characters
Email: must match RFC 5321 email format, max 254 characters
Phone, message, timeframe: bounded length
Preferred contact channel: enum of email | phone | whatsapp
Conversation insights, suggested topics: bounded arrays of bounded strings
Optional URL fields: must be valid URLs

Malformed payloads are rejected with HTTP 400 and a structured error log entry, never reaching the database or email pipeline.

1.4 Cross-site scripting (XSS) in transactional email

Risk eliminated: stored XSS that could execute in the engineering team's inbox when opening a malicious consultation request.

The consultation email template was concatenating client-supplied strings (name, company, email, message, AI-detected industry labels) directly into raw HTML. An attacker submitting a name like <script>...</script> would have that markup rendered as live HTML when the email was opened in any permissive client.

We introduced a small escape library (src/lib/escapeHtml.ts) and applied it to every interpolated value in the template. Mail-to links are validated with a strict regex and URL-encoded before reaching the href attribute.

1.5 File-type validation by content, not extension

Risk eliminated: stored XSS and arbitrary code execution via malicious uploads on the public upload endpoint.

Previously, /api/public-upload trusted the file extension provided by the client. A user could rename payload.html to image.png and the server would save it as-is. Browsers reading the file later might still interpret it as HTML, depending on response headers — a classic vector.

We added a magic-byte detector (src/lib/fileType.ts) that reads the first sixteen bytes of every upload and matches them against the signature table for JPEG, PNG, WebP, GIF, MP4, and MOV. Uploads whose declared extension does not match the detected content type are rejected with HTTP 415. The verification happens before the buffer is written to disk.

1.6 Distributed denial-of-service hardening

Risk eliminated: traffic floods that exhaust OpenAI quota, fill storage, or overwhelm Nginx worker capacity.

The previous rate limit was tied to a per-process in-memory map. That is acceptable for a single-container deploy (the current VPS), but the limit multiplies in a multi-replica setup, so we made the implementation forward-compatible:

A RateLimitStore abstraction with two implementations:
- In-memory (default, zero new dependencies)
- Upstash Redis over REST (auto-activates when REDIS_URL and REDIS_TOKEN environment variables are set)
Both implementations share the same token-bucket algorithm so request semantics do not change when scaling.

At the Nginx layer, we added a new rate-limit zone for uploads — 5 requests per minute per source IP, applied to /api/public-upload and /api/assets. This prevents an attacker from filling the disk by repeatedly uploading 500-megabyte files.

1.7 Browser-layer security headers

Risk reduced: click-jacking, MIME confusion, referrer leakage, undesired device-API access, and reflected-XSS impact.

Nginx now emits a complete set of security response headers on every HTTPS response:

Header	Purpose
`Content-Security-Policy`	Restricts which origins can serve scripts, styles, images, fonts, and network connections
`Strict-Transport-Security`	Pre-existing; forces HTTPS for two years
`X-Frame-Options: DENY`	Prevents the site from being embedded in iframes (click-jacking defense)
`X-Content-Type-Options: nosniff`	Disables MIME sniffing
`Referrer-Policy: strict-origin-when-cross-origin`	Prevents leaking the full URL to third-party links
`Permissions-Policy`	Blocks camera, microphone, and geolocation APIs

The Content Security Policy allow-lists only api.openai.com and the Upstash REST endpoint for outbound connections. Inline scripts and styles remain permitted for now because Next.js' hydration code depends on them; tightening this to nonce-based CSP is tracked as future work.

2. Code Quality and Performance

2.1 Dead code removal

GlobalOperations_old.tsx (310 lines, no references) was removed. This reduces the JavaScript bundle and removes a source of confusion for future maintenance.

2.2 Eliminated polling-based session checks

The site's navigation bar previously checked document.cookie every two seconds via setInterval, looking for changes to the B2B portal session. Polling like this:

Burns CPU cycles continuously, even when nothing has changed
Is liable to memory leaks on rapid mount/unmount cycles
Updates the UI with up to two seconds of lag after login or logout

We replaced it with an event-driven implementation:

The authentication modal dispatches a flux:session-changed custom event immediately on successful login or logout.
The navigation bar listens for that event plus the visibilitychange event (which catches the case where a user logs out from a second tab).
No interval, no polling, no lag.

2.3 Strict TypeScript across data-driven components

Several large React sections (ApplicationsDashboard, GlobalOperations) declared their database-shaped props as any[]. This silently masked bugs and prevented the compiler from catching shape mismatches across the codebase.

We introduced src/types/cms.ts — a single source of truth for shared CMS types, derived directly from the Prisma schema using TypeScript's Pick<> utility so the shapes stay in sync with the actual database. Component props were updated to use these named types. JSON-string fields (galleryJson, dashboardMetricsJson, etc.) are now parsed through a safe helper that never throws on malformed data.

2.4 Database indices on hot paths

Several Prisma queries filter by isActive, category, or nodeType — the fields that control which content is visible on the public site. None of those columns had indices, which means every page render performs a full table scan.

We added the missing indices via a regular Prisma migration:

Table	Index
`GlobalNode`	`isActive`, `nodeType`, composite `(nodeType, isActive)`
`Application`	`isActive`, `category`
`NewsArticle`	`isActive`, composite `(isActive, publishedAt DESC)`
`SparePart`	`isActive`

For the current catalogue size (~50 records per table) the speed-up is small in absolute terms, but the cost of adding indices at this stage is trivial and pays off for free as content scales.

2.5 Structured JSON logging

The codebase had console.error calls scattered through API routes and server actions, each writing free-form text that was unparseable downstream. We introduced src/lib/logger.ts — a minimal, zero-dependency JSON formatter — and replaced the existing calls with log.info, log.warn, and log.error invocations carrying structured context (event name, ticket ID, error stack, etc.).

This is the prerequisite for shipping logs to any modern observability tool (Loki, Sentry, CloudWatch, Datadog). Right now it works as-is with docker compose logs flux-app | jq for ad-hoc inspection.

3. New Capability — FluxAI Conversation Analytics

This is the largest functional addition in the iteration.

3.1 The problem

The on-site engineering assistant (FluxAI) was already capable, but every conversation was lost the moment the visitor closed the tab. There was no way to answer questions like:

How many people are actually using the assistant?
Which industries are they coming from?
What fraction of conversations lead to a consultation request?
Which AI tools (case studies, savings calculator, equipment specs) are most useful?
How long does a typical conversation last?
Are visitors getting stuck at any particular point?

This iteration adds full persistence and a dedicated dashboard.

3.2 Data model

Two new database tables capture the full life-cycle of every conversation:

AiConversation — one row per visitor session.

Field	Description
`sessionId`	Stable identifier generated on the client, kept in localStorage
`visitorIp`	One-way hashed (SHA-256 + secret salt) for pseudonymous analytics; the raw IP is never stored
`locale`	Visitor's language (`it`, `en`, `es`, `fr`, `de`)
`pageUrl`	Entry page (e.g. `cases/textile-drying`)
`industryLabel`	Detected automatically from the user's first message
`funnelStage`	One of `DISCOVERY`, `QUALIFY`, `RECOMMEND`, `HANDOFF`
`outcome`	`OPEN`, `CONSULTATION`, or `ABANDONED`
`messageCount`, `toolCallCount`	Activity counters
`estimatedSavingsPercent`, `productionVolume`	Captured when the AI runs its calculator
`signalId`	Foreign key to `OperationsSignal` if the chat converted to a consultation ticket
`startedAt`, `lastMessageAt`, `closedAt`	Timeline

AiEvent — one row per individual event inside a conversation.

Field	Description
`type`	`user_msg`, `ai_msg`, `tool_call`, `tool_result`, `error`
`payloadJson`	The serialized content, truncated to 8 KB
`toolName`	Which AI tool was invoked (when applicable)
`latencyMs`	Wall-clock time the AI took to respond
`tokensIn`, `tokensOut`, `cachedTokens`	OpenAI cost tracking
`createdAt`	Timestamp

Both tables are extensively indexed for the dashboard queries below.

3.3 Funnel stage detection

The system automatically advances the conversation through four stages based on the AI's behaviour:

DISCOVERY — initial state, before any industry is identified.
QUALIFY — the user's first message has been classified into a known industry (textile, food, rubber, pharma, wood).
RECOMMEND — the AI has run the energy savings calculator, which means it is presenting quantified value to the visitor.
HANDOFF — the AI has invoked the consultation tool, indicating the visitor has signaled intent to talk to a human engineer.

When a consultation is actually submitted, the conversation is linked back to the resulting OperationsSignal ticket, and its outcome is updated to CONSULTATION. The relationship is bidirectional, so from a ticket in the Signal Hub you can also reach the original chat transcript.

3.4 The dashboard

A new section was added to the HQ Command Center at /hq-command/dashboard/conversations. It surfaces:

At-a-glance KPIs:

Total conversations
Conversion rate (consultations divided by total)
Average messages per chat
Average tool calls per chat

Funnel breakdown: how many visitors are in each of the four stages, with percentages relative to the total.

Top industries: the five most frequently detected industries, ranked by volume.

Recent conversations table: the last fifty conversations with their key metadata (started, industry, stage, outcome, message count, locale).

Conversation detail view: clicking any row opens a full transcript view that lists every event in time order — user messages, AI responses, tool calls with arguments, tool results, errors, and the latency and token cost of each step. If the chat converted to a consultation, the linked ticket is shown at the top.

3.5 Cost monitoring readiness

The data model captures tokensIn, tokensOut, and cachedTokens on every AI response. Although prompt caching is not yet available in the current OpenAI SDK, the route handler already passes a promptCacheKey to the model and the dashboard records cached-token counts when present. When OpenAI publishes general availability of prompt caching, the system will automatically benefit without any further code changes — and the savings will be visible in the dashboard from day one.

3.6 Privacy posture

The system was designed with European data-protection norms in mind:

The visitor's IP address is never stored as-is. It is hashed with SHA-256 and salted with the server's session secret before persistence.
Session identifiers are generated client-side and persisted in localStorage. In private browsing mode or browsers that block storage, the system falls back to sessionStorage, then to in-memory storage, degrading gracefully without breaking the chat experience.
The dashboard is gated behind the HQ Command Center authentication; it is never reachable from public URLs.

4. Infrastructure Improvements

4.1 Database readiness probe

The /api/health endpoint previously returned a static 200 OK regardless of the actual system state. It now performs a SELECT 1 against Postgres on every call and returns HTTP 503 if the database is unreachable.

This enables two important operations:

Docker auto-recovery: the app service now has a healthcheck block that runs every 30 seconds. Docker will restart the container if the check fails repeatedly.
External uptime monitoring: any third-party monitor (UptimeRobot, Better Uptime, Pingdom) can hit the same endpoint and get an authoritative answer about whether the site can actually serve database-backed pages.

4.2 Environment configuration template

The repository's env template was rewritten to document every required variable, the format expected, and how to generate strong values. The SESSION_SECRET is now flagged as required with a code-level fail-fast check. Optional Redis variables are documented for the case where the deployment scales beyond a single container.

4.3 Docker Compose health check

A health check block was added to the app service in docker-compose.yml:

healthcheck:
  test: ["CMD-SHELL", "node -e \"fetch('http://localhost:3000/api/health')...\""]
  interval: 30s
  timeout: 5s
  retries: 3
  start_period: 40s

This lets Docker (and any orchestrator above it) automatically recycle the container if the application loses its database connection or hangs.

5. Quality Assurance

5.1 Automated regression tests

We introduced an automated test suite covering the hardening modules. The suite is run via npm run test:ai and uses Node.js' built-in test runner — no new dependencies are added to the project. Thirteen test cases are included:

HTML escaping kills script-tag injection
HTML escaping defeats attribute-breakout payloads
HTML escaping handles null and undefined cleanly
File-type detector recognises PNG, JPEG, and MP4 by magic bytes
File-type detector rejects HTML payloads renamed to image extensions
Industry detector picks textile from textile-related phrasing
Industry detector picks food from food-processing phrasing
Industry detector returns null on off-topic prompts
CSRF tokens verify successfully when fresh
CSRF tokens fail verification when tampered with
CSRF garbage inputs are rejected

These tests are deterministic, fast (under 100 milliseconds), and do not make any external network calls.

5.2 Production build verification

The full Next.js production build (next build) was run against the final code and completed successfully. All new routes appear in the build manifest:

/api/csrf — dynamic
/api/health — dynamic
/hq-command/dashboard/conversations — dynamic
/hq-command/dashboard/conversations/[id] — dynamic

TypeScript compilation passes with zero errors against the strict configuration used in production.

6. Deployment and Operations

6.1 Database migration

A single additive migration file is included:

prisma/migrations/20260526180000_add_indexes_and_ai_telemetry/

The migration:

Creates the two new analytics tables
Adds the six new indices
Wires the foreign keys with IF NOT EXISTS guards for idempotency

It is safe to run against production data. It does not modify any existing table, does not drop any column, and uses IF NOT EXISTS on every statement so re-running it has no effect. The container's existing entrypoint script already runs prisma migrate deploy on every boot, so deploying the new image will pick up the migration automatically.

6.2 Required environment variables

Before deploying to the VPS, confirm the following:

Variable	Required	Notes
`SESSION_SECRET`	Yes	At least 32 characters. Generated via `openssl rand -base64 48`. The app will refuse to start without it.
`DATABASE_URL`	Yes	Existing
`OPENAI_API_KEY`	Yes	Existing
`SMTP_*`	Yes	Existing
`REDIS_URL`, `REDIS_TOKEN`	No	Only set when scaling to multiple containers
`NEXT_PUBLIC_APP_URL`	Yes	Existing

6.3 Verification checklist after deploy

The following commands can be used to verify a successful deploy:

# Container health
docker compose ps    # app status should be "healthy"
docker compose logs --tail=100 app    # no SESSION_SECRET errors

# Endpoint smoke tests
curl -s https://www.rf-flux.com/api/health
# expected: {"ok":true,"db":"up","latencyMs":N,"ts":"..."}

curl -I https://www.rf-flux.com/
# expected security headers: Content-Security-Policy, X-Frame-Options:DENY,
# X-Content-Type-Options:nosniff, Referrer-Policy, Permissions-Policy

# Database migration applied
docker compose exec postgres psql -U flux_user -d flux_db -c "\d AiConversation"
# expected: table description with all columns

# AI conversations populating
# After someone uses the chat:
docker compose exec postgres psql -U flux_user -d flux_db \
  -c "SELECT \"sessionId\", \"funnelStage\", \"outcome\", \"messageCount\" FROM \"AiConversation\" ORDER BY \"startedAt\" DESC LIMIT 5;"

The new dashboard is reachable at:

https://www.rf-flux.com/hq-command/dashboard/conversations

(requires admin login, same as the rest of the HQ Command Center.)

7. Known Limitations and Recommendations

7.1 Items intentionally deferred

Content Security Policy nonces. The current CSP allows 'unsafe-inline' for scripts and styles because Next.js hydration depends on them. Migrating to nonce-based CSP would require changes to next.config.ts and the build pipeline. This is a known follow-up.
Prompt caching for the AI. The OpenAI SDK does not yet expose prompt caching to consumers. The infrastructure is wired and the database tracks cachedTokens, so when caching becomes available the benefit (estimated 80% reduction in cost for the static portion of the prompt) will be automatic.
Email sequence automation, lead scoring, CRM integration. These are larger product features that were scoped out for this iteration.

7.2 Recommended next steps

Rotate the OpenAI API key. The current key is present in earlier commits of the public repository. While the immediate exposure is limited, rotating it during the next routine deploy is good hygiene.
Rotate the SMTP password. Same reasoning as above.
Move the env file out of version control. A follow-up commit should convert env into .env.example (containing only placeholders) and add env to the .gitignore. The real .env is already gitignored, so this is the final step in eliminating secrets from the repository.
Consider Sentry or equivalent error aggregation. The structured logger introduced in this iteration is the prerequisite. Wiring it to a hosted aggregation service is a half-day task and dramatically improves time-to-detection for production errors.
Schedule a 30-day review of the conversation dashboard data. The analytics will be most useful after a month of real traffic. At that point we can identify the highest-impact funnel-stage improvements based on actual visitor behaviour.

Appendix A — Files Modified or Created

New files (10):

File	Purpose
`src/lib/csrf.ts`	CSRF token issuance and verification
`src/lib/escapeHtml.ts`	HTML escaping helpers
`src/lib/fileType.ts`	Magic-byte file-type detection
`src/lib/logger.ts`	Structured JSON logger
`src/lib/aiSessionId.ts`	Client-side session ID with privacy fallbacks
`src/types/cms.ts`	Shared CMS type definitions
`src/app/api/csrf/route.ts`	CSRF token issuance endpoint
`src/app/api/health/route.ts`	Database readiness probe
`src/app/hq-command/dashboard/conversations/page.tsx`	Analytics dashboard
`src/app/hq-command/dashboard/conversations/[id]/page.tsx`	Conversation detail view
`prisma/migrations/20260526180000_add_indexes_and_ai_telemetry/migration.sql`	Additive database migration
`tests/ai/golden.test.mjs`	Regression test suite

Modified files (19):

File	Change
`src/lib/session.ts`	Fail-fast on missing or weak `SESSION_SECRET`
`src/lib/rateLimit.ts`	Pluggable backend (in-memory or Redis)
`src/app/actions/clientAuth.ts`	Same fail-fast as `session.ts`
`src/app/api/chat/route.ts`	AI telemetry persistence and prompt cache key
`src/app/api/consultation/route.ts`	CSRF + Zod + escapeHtml
`src/app/api/public-upload/route.ts`	Magic-byte validation
`src/components/layout/NavBar.tsx`	Event-driven session check
`src/components/ai/SilentObserver.tsx`	Sends sessionId in transport body
`src/components/ai/ConsultationScheduler.tsx`	Sends CSRF token in form post
`src/components/sections/ApplicationsDashboard.tsx`	Strict types replace `any[]`
`src/components/sections/GlobalOperations.tsx`	Strict types replace `any[]`
`src/app/[locale]/parts/_components/AuthModal.tsx`	Dispatches session-changed event
`src/app/hq-command/dashboard/page.tsx`	Tile for the new conversations dashboard
`prisma/schema.prisma`	New models, indices, back-reference on `OperationsSignal`
`nginx/conf.d/flux.conf`	Security headers, upload rate-limit zone
`docker-compose.yml`	Health check, optional Redis env vars
`package.json`	`npm run test:ai` script
`env`	Documented `SESSION_SECRET` requirement and Redis variables

Removed files (1):

File	Reason
`src/components/sections/GlobalOperations_old.tsx`	Unreferenced legacy code (310 lines)

Appendix B — Quick Reference for the Sales Team

For team members who want to use the new analytics without engineering help:

Log in to the HQ Command Center at https://www.rf-flux.com/hq-command.
From the main dashboard, click the FluxAI Conversations tile (cyan sparkle icon, last position in the grid).
The top four cards show overall numbers: total conversations, conversion rate, average messages, average tool calls.
The two panels below show the funnel breakdown and the most common industries.
The table lists the last fifty conversations. Click Open on any row to see the full transcript.
Conversations that converted to a consultation ticket display the ticket ID in green at the top of the detail view.

The data updates in real time — no refresh needed between visits.

End of report.

26 KiB Raw Blame History Unescape Escape