Architecture Decisions
Why the site is built the way it is — every significant choice with its rationale and tradeoffs.
System at a Glance
Browser ⇄ WebRTC audio with the Realtime model via a short-lived ephemeral token; transcripts stream over a separate WebSocket.
Read-only tools any agent can call: search, get_timeline, get_cv.
Live System Metrics
The cache and latency numbers below come straight from the running backend — the full dashboard is on the Stats page.
View live metrics →Why Two Services?
The frontend (Next.js) and backend (FastAPI) are intentionally separate services, not a monolith. The primary reason is demonstration: a monolith on a serverless platform can't run persistent connections, background tasks, or a vector similarity cache — all of which this site needs.
The split also reflects real-world backend engineering. The FastAPI service handles authentication, rate limiting, semantic caching, vector retrieval, and streaming — things that belong in a backend, not a Next.js API route.
Server Components for static content
Client Components for interactive features
Streaming UI for progressive enhancement
App Router for nested layouts
Async Python with full type annotations
RAG pipeline for LLM grounding
Redis semantic cache
Security middleware chain
RAG pipeline and retrieval architecture details are available to unblocked visitors.
Vector storage infrastructure details are available to unblocked visitors.
Semantic caching implementation details are available to unblocked visitors.
Security pipeline details are available to unblocked visitors.