Solo Dev Shipped Production App on Cursor—Then API Hallucinations Nearly Sank It
A solo developer built and deployed a full-stack LLM platform (3 API integrations, real-time streaming, React/Express/TypeScript) almost entirely using Cursor + Codex. The tool excelled at scaffolding and pattern replication—until API hallucinations, scope creep, race conditions, and silent failures nearly killed the project in production.
The promise was real: a solo dev, Cursor, and a full-stack platform with Anthropic, OpenAI, and Google LLM integrations, real-time SSE streaming, and a React + Express + TypeScript backend running on SQLite. Scaffolding that took weeks came together in days. Pattern replication worked—build one API integration by hand, watch Codex replicate it across two more providers. Types stayed consistent between frontend and backend almost without thinking.
Then the wheels came off.
Codex hallucinated model IDs that don't exist, mixed up OpenAI's Chat Completions API with a Responses API that might not even be real, and invented parameters. The code compiled. The app crashed at runtime. Every external API call needed manual verification against actual documentation. A two-line hardcoded fix triggered a seven-phase refactoring that touched every backend file—and it happened repeatedly. The scope creep problem became muscle memory: tight prompts or watch your codebase get rewritten to fix a typo.
Streaming logic was the killshot. Every time Codex touched the SSE race-condition handling, the code looked right until it ran under load. The dev wrote all concurrency code by hand after that. Worse: Codex set "reasonable" token limits that silently truncated JSON output. The app looked functional. It returned garbage. No errors thrown. Days of debugging before the truth surfaced.
The hard lesson: trust the tool for structure and boilerplate, verify anything that talks to the outside world, and write concurrency yourself. Even at 4x scaffolding speed, an AI agent can turn a feature request into a production nightmare in a single prompt.
