HLE

Reasoning

Humanity's Last Exam (no tools)

Around 2,500 extremely hard expert-written questions across more than a hundred subjects, designed as the final closed-ended academic benchmark. Scores reported without tool use. Even frontier models score far below expert level, leaving real headroom.

~ marks community-reported or version-normalized figures; all others come from official model cards. Prices shown as input/output per 1M tokens. Updated 2026-06-10.