Terminal-Bench

Agentic

Terminal-Bench (agentic terminal tasks)

End-to-end tasks an engineer would do in a real terminal: building code, wrangling servers, debugging environments. The model operates a shell autonomously until the task is done. Strong predictor of performance inside CLI coding agents.

~ marks community-reported or version-normalized figures; all others come from official model cards. Prices shown as input/output per 1M tokens. Updated 2026-06-10.