MMMU
MultimodalMassive Multi-discipline Multimodal Understanding
11,500 college-level questions that require jointly understanding images — charts, diagrams, maps, music sheets, chemical structures — and text across 30 subjects. The standard benchmark for visual reasoning in multimodal models.
1Gemini 3 ProGoogle~87%2GPT-5OpenAI84.2%3OpenAI o3OpenAI82.9%4OpenAI o4-miniOpenAI81.6%5Gemini 2.5 ProGoogle~79.6%6Claude Sonnet 4.5Anthropic~77.8%7Gemini 2.5 FlashGoogle~76.9%8GPT-4.1OpenAI74.8%9Llama 4 MaverickMeta73.4%10Llama 4 ScoutMeta69.4%11GPT-4oOpenAI69.1%12Amazon Nova ProAmazon61.7%
~ marks community-reported or version-normalized figures; all others come from official model cards. Prices shown as input/output per 1M tokens. Updated 2026-06-10.