Running an AI-Powered Site: Cost Analysis

What it actually costs to run a personal portfolio with gpt-5.4-mini chat, realtime voice, and gpt-4o job-description analysis — and the controls that keep it affordable.

Real-Time Cost Savings

Every cache hit is an LLM call that didn't happen. The live cache hit rate represents direct cost avoidance — visible on the Stats page.

View live cache metrics →

Infrastructure Costs

Costs shown as ranges — exact amounts are intentionally not disclosed.

VPS / Docker host

$5–15/month

Single server, 2–4 vCPU, 4–8 GB RAM. Adequate for all services.

Domain name

$10–15/year

Standard .dev or .io domain.

PostgreSQL + Redis

Self-hosted as Docker containers. No managed DB cost.

CDN / DDoS protection

$0–5/month

Cloudflare free tier covers most needs.

AI API Costs

Pricing is based on published OpenAI rates. Per-use figures are illustrative estimates that assume typical usage patterns and include semantic cache effects — not billed amounts. The only live, measured economics on this site is the cache hit rate above, on the Stats page.

Chat (GPT-5.4 mini)

gpt-5.4-mini

Semantic caching keeps the most frequent interaction affordable; a cache hit costs effectively nothing.

Impact: High — most frequent user interaction.

JD Analysis (GPT-4o)

gpt-4o

~$0.01–0.03 per analysis depending on JD length.

Impact: Medium — triggered by explicit user action.

Voice (Realtime API)

gpt-realtime-2/low

Audio tokens are billed separately from text tokens. Sessions are capped because voice is the highest per-session API cost.

Impact: High per-session but infrequent.

Embeddings (text-embedding-3-small)

text-embedding-3-small

~$0.0001 per query embedding. Essentially negligible.

Impact: Low.

Cost Controls

The site was designed with cost predictability from the start. Semantic caching is the most impactful control — it reduces LLM calls while also improving response time for cached queries.

Semantic Caching

40–60% reduction in LLM API calls

Redis caches LLM responses by embedding similarity. Cache hit rate of 40–60% typical for a portfolio — most visitors ask similar questions.

Rate Limiting

Protects against abuse, not routine traffic

Per-IP and per-session sliding windows. Prevents any single user from consuming disproportionate API budget.

Context Window Budget

~30% vs unbounded retrieval

Retrieved chunks are limited to top-5 by similarity. Prompt length is bounded, keeping token costs predictable.

Voice Session Limits

Bounds worst-case cost per user

Voice sessions have a maximum duration. This is the highest per-minute cost of any feature.

Monthly Cost Range

Total monthly cost depends heavily on traffic and voice usage. At typical personal portfolio traffic levels (a few hundred visitors/month), the dominant cost is infrastructure, not API. API costs only become significant at scale or with heavy voice usage.

Low traffic

$5–10/month

~50–100 chat sessions, no voice

Moderate traffic

$10–20/month

~200–500 sessions, occasional voice

High traffic / active job search

$20–40/month

Frequent JD analyses + voice demos

* Ranges are illustrative — exact figures intentionally not disclosed.