Running an AI-Powered Site: Cost Analysis
What it actually costs to run a personal portfolio with gpt-5.4-mini chat, realtime voice, and gpt-4o job-description analysis — and the controls that keep it affordable.
Real-Time Cost Savings
Every cache hit is an LLM call that didn't happen. The live cache hit rate represents direct cost avoidance — visible on the Stats page.
View live cache metrics →Infrastructure Costs
Costs shown as ranges — exact amounts are intentionally not disclosed.
Single server, 2–4 vCPU, 4–8 GB RAM. Adequate for all services.
Standard .dev or .io domain.
Self-hosted as Docker containers. No managed DB cost.
Cloudflare free tier covers most needs.
AI API Costs
Pricing is based on published OpenAI rates. Per-use figures are illustrative estimates that assume typical usage patterns and include semantic cache effects — not billed amounts. The only live, measured economics on this site is the cache hit rate above, on the Stats page.
Semantic caching keeps the most frequent interaction affordable; a cache hit costs effectively nothing.
Impact: High — most frequent user interaction.
~$0.01–0.03 per analysis depending on JD length.
Impact: Medium — triggered by explicit user action.
Audio tokens are billed separately from text tokens. Sessions are capped because voice is the highest per-session API cost.
Impact: High per-session but infrequent.
~$0.0001 per query embedding. Essentially negligible.
Impact: Low.
Cost Controls
The site was designed with cost predictability from the start. Semantic caching is the most impactful control — it reduces LLM calls while also improving response time for cached queries.
Semantic Caching
40–60% reduction in LLM API callsRedis caches LLM responses by embedding similarity. Cache hit rate of 40–60% typical for a portfolio — most visitors ask similar questions.
Rate Limiting
Protects against abuse, not routine trafficPer-IP and per-session sliding windows. Prevents any single user from consuming disproportionate API budget.
Context Window Budget
~30% vs unbounded retrievalRetrieved chunks are limited to top-5 by similarity. Prompt length is bounded, keeping token costs predictable.
Voice Session Limits
Bounds worst-case cost per userVoice sessions have a maximum duration. This is the highest per-minute cost of any feature.
Monthly Cost Range
Total monthly cost depends heavily on traffic and voice usage. At typical personal portfolio traffic levels (a few hundred visitors/month), the dominant cost is infrastructure, not API. API costs only become significant at scale or with heavy voice usage.
* Ranges are illustrative — exact figures intentionally not disclosed.