MMMU

Multimodal

Massive Multi-discipline Multimodal Understanding

11,500 college-level questions that require jointly understanding images — charts, diagrams, maps, music sheets, chemical structures — and text across 30 subjects. The standard benchmark for visual reasoning in multimodal models.

1Gemini 3 ProGoogle$2/$12~87%2GPT-5OpenAI$1.25/$1084.2%3OpenAI o3OpenAI$2/$882.9%4OpenAI o4-miniOpenAI$1.1/$4.481.6%5Gemini 2.5 ProGoogle$1.25/$10~79.6%6Claude Sonnet 4.5Anthropic$3/$15~77.8%7Gemini 2.5 FlashGoogle$0.3/$2.5~76.9%8GPT-4.1OpenAI$2/$874.8%9Llama 4 MaverickMeta$0.27/$0.8573.4%10Llama 4 ScoutMeta$0.18/$0.5969.4%11GPT-4oOpenAI$2.5/$1069.1%12Amazon Nova ProAmazon$0.8/$3.261.7%

~ marks community-reported or version-normalized figures; all others come from official model cards. Prices shown as input/output per 1M tokens. Updated 2026-06-10.