MMMU

Multimodal

Massive Multi-discipline Multimodal Understanding

11,500 college-level questions that require jointly understanding images — charts, diagrams, maps, music sheets, chemical structures — and text across 30 subjects. The standard benchmark for visual reasoning in multimodal models.

~ marks community-reported or version-normalized figures; all others come from official model cards. Prices shown as input/output per 1M tokens. Updated 2026-06-10.