Prompt Arena
Prompt engineering is hard to evaluate objectively. Prompt Arena makes it competitive — users submit prompts, the system runs them against the same LLM, and the community votes on which output is better.
Architecture
- Backend — FastAPI with async PostgreSQL (via SQLAlchemy + asyncpg)
- Frontend — React SPA with real-time battle updates
- Rating system — ELO-based ranking adapted for prompt quality comparison
- Infrastructure — Docker Compose for local dev, with plans for Kubernetes deployment
The interesting parts
The ELO system was fun to implement. Each battle outcome updates both prompts’ ratings using the standard formula, but I added a confidence decay factor so prompts that haven’t been tested recently slowly lose rating certainty.
Prompt injection defense was also a real concern — users will absolutely try to game the system. Input sanitization and output comparison heuristics help catch obvious manipulation.
What I learned
Building a platform with user-generated content means thinking about abuse from day one. Rate limiting, content moderation hooks, and clear community guidelines aren’t afterthoughts — they’re core features.