Lessons Learned
Honest reflections on what worked, what was harder than expected, and what would be done differently. No polish — just the actual lessons.
Technical Lessons
Specific bugs and surprises encountered during implementation — the kind of things that don't show up in tutorials.
Pydantic AI: cumulative vs delta streaming
Pydantic AI's streaming API returns cumulative text (full response so far), not deltas. Naively forwarding tokens to the client doubles the displayed text. The fix is to track the last sent offset and send only the new suffix on each chunk.
asyncpg serialization with complex types
asyncpg requires explicit type casts for JSONB and vector columns. Passing a Python list directly to a pgvector column raises a type error at runtime, not at schema-validation time. Use `str(list)` cast or register a codec.
Windows stdout encoding bug (structlog + uvicorn)
On Windows, structlog's JSON renderer causes an OSError (errno 22) when uvicorn writes log entries to stdout. The fix is to run the backend in Docker — consistent behavior across platforms without fighting Windows encoding.
fakeredis event loop binding
Using aioredis FakeRedis in pytest with FastAPI TestClient creates two event loops (one per test, one for the server). The fix is FakeServer() (sync-safe) with a FakeRedis client created inside the dependency override, not at module level.
HTTPException headers in FastAPI
Raising HTTPException discards the Response object. Headers (e.g. Retry-After for rate limiting) must be passed to HTTPException(headers=...), not set on a response object that will be overridden.
Architecture Lessons
Tradeoffs that surfaced during implementation and things that would be done differently with the benefit of hindsight.
Monolith would have been faster to prototype
Two services add operational complexity: CORS, separate deployments, two sets of tests. For a portfolio, a Next.js-only architecture with server-side OpenAI calls would launch faster. The split was chosen to demonstrate backend skills, not because it's the simplest solution.
pgvector with HNSW is enough at this scale
At a few hundred chunks, pgvector handles vector search well within latency targets — no separate vector database needed. The site uses an HNSW index, which gives strong recall and low query latency without periodic re-tuning as the corpus grows, at the cost of index build time and memory. At a much larger scale, a purpose-built vector store could still be worth the operational cost.
Semantic cache threshold needs calibration
A cosine similarity threshold of 0.93 was chosen conservatively to avoid serving cached responses to semantically different questions. Too low and wrong answers are cached; too high and the cache rarely hits. The right threshold is dataset-specific.
Rate limiting storage in Redis is a single point of failure
If Redis is unavailable, the rate limiter falls back to in-memory. This is intentional (availability over consistency) but means rate limits are per-instance, not global, during degraded mode.
AI Development Lessons
Practical lessons on where AI assistance accelerates development and where it requires more careful guidance.
AI excels at boilerplate and well-defined patterns
CRUD endpoints, component scaffolding, test setup, type definitions — AI generates these correctly and quickly. The more the task resembles patterns seen in training data, the better the output.
AI needs explicit context for design decisions
"Add a cache" produces generic caching. "Add a Redis semantic similarity cache with cosine threshold 0.93 and 24h TTL that checks before the LLM call" produces the right thing. Specificity is the user's job.
Plan files prevent scope creep
Without explicit task boundaries, AI tends to add features it thinks are useful. PLAN.md tasks with explicit done-criteria kept each session focused. Deviations were tracked rather than silently accepted.
AI-generated tests are high quality
The test suite has comprehensive coverage because AI tests edge cases that humans often skip (null inputs, race conditions, encoding edge cases). The test-first approach also caught bugs before they reached integration.
Security review still requires human judgment
AI knows security patterns and can implement them. It does not know the specific threat model for this application. The rate limiting parameters, injection patterns, and output filters were reviewed and tuned by a human.
For Recruiters
This project demonstrates real-world engineering skills, not just the ability to follow a tutorial. Here is the evidence behind each claim.
Designed a two-service architecture with explicit tradeoffs: RAG over fine-tuning, pgvector over managed vector DB, Redis semantic cache. Each choice has a documented rationale.
Implemented rate limiting, injection detection, output filtering, and a blocking system. Chose HMAC-SHA256 for PII hashing. Reviewed threat model against a personal portfolio context.
Full test suite: unit tests (pytest + vitest), integration tests (FastAPI TestClient), and end-to-end tests (Playwright). TDD used for security-critical paths.
Streaming RAG pipeline, semantic caching, job description analysis, real-time voice. Not just API calls — production-grade integration with error handling, fallbacks, and cost controls.
Docker for consistent environments, devlog for change tracking, cost analysis with controls, degraded-mode handling for Redis failures.