Llama 4 Scout

Meta·Apr 2025open weights · Llama 4 Community License

Llama 4 Scout's headline feature remains unmatched a year later: a 10 million token context window, an order of magnitude beyond anything else in this catalog. Real-world recall at extreme lengths is imperfect, but for sheer ingest capacity — entire codebases, document archives, books — nothing else comes close. It runs (quantized) on a single H100, making it a practical local multimodal workhorse.

Benchmark results

Where it shines

  • 10M-token context window, the industry record
  • Single-GPU deployable at 109B total / 17B active
  • Native multimodal input

Alternatives to Llama 4 Scout

Frequently asked questions

How much does the Llama 4 Scout API cost?
Llama 4 Scout costs $0.18 per million input tokens and $0.59 per million output tokens. A workload of 10M input and 1.5M output tokens per month costs about $2.69.
What is the context window of Llama 4 Scout?
Llama 4 Scout supports a context window of 10,000,000 tokens (10M).
Is Llama 4 Scout open source?
Yes — Llama 4 Scout is an open-weights model released under the Llama 4 Community License license, so it can be downloaded and self-hosted.
What are the best alternatives to Llama 4 Scout?
The closest alternatives by overall capability are Qwen3-Max, GPT-5.5, Gemini 3 Pro, Gemini 2.5 Pro. See the comparison pages for detailed head-to-head breakdowns.