AI Drag Racing

A live benchmark experiment by Jonathan R Reed that races AI models side by side so you can watch latency in real time.

Use the same prompt and comparable settings when you want a fair read. The useful signal is not only who finishes first, but which model starts quickly, streams steadily, handles the task cleanly, and avoids provider errors during a real browser session.

How to read an AI model race

AI Drag Racing compares model behavior under the same prompt, selected settings, and local browser session. Time to first token shows how quickly a provider starts responding. Total response time shows how long the full answer takes. Tokens per second is useful for longer generations because a model can start quickly but still stream slowly after the first token appears.

Treat every run as a live measurement, not a permanent leaderboard. Network route, provider load, regional availability, selected model, prompt length, reasoning settings, and output length all affect results. For a fair comparison, select comparable models, reuse the same prompt, run more than one race, and compare both the timing metrics and the actual answer quality.

The app is built for practical evaluation work: checking which model feels fastest for coding prompts, support drafts, writing tasks, summarization, structured extraction, and agent workflows. It is also a useful way to spot provider errors, slow starts, rate-limit behavior, and models that look fast only because they produce shorter answers.

Repeated runs matter. A single race can show a spike, but a few consistent runs reveal whether the delay comes from the provider, the model, the prompt shape, or the current network path.