Llama 4 Scout

Meta · Llama 4

Efficiency-focused Llama 4 tier for customizable deployments with tighter compute budgets.

Part of Llama family · Other versions: Llama 4 Maverick
Type
multimodal
Context
10M tokens
Max Output
33K tokens
Status
current
API Access
Yes
License
Llama Community
open-weights efficient self-hosted automation customization
Released April 2025 · Updated March 6, 2026

Overview

Freshness note: Model capabilities, deployment options, and licensing terms can change. This profile is a point-in-time snapshot last verified on February 15, 2026.

Llama 4 Scout is Meta’s efficiency-oriented open-weight Llama 4 model for teams that need customization with lower serving costs than larger open models. Meta’s official launch materials describe Scout as a natively multimodal MoE model with 17B active parameters across 16 experts and a 10 million token context window.

Capabilities

Scout is typically used for structured assistant tasks, summarization, extraction, multimodal understanding, and moderate reasoning workflows. It performs best when prompts and task domains are well defined.

Technical Details

Meta positions Scout as the more deployable of the two initial Llama 4 models, capable of running on a single H100 GPU with Int4 quantization. Performance outcomes still depend on runtime optimizations, serving stack choices, and evaluation quality.

Pricing & Access

There is no single universal pricing model because deployment can be self-managed or provider-hosted. Teams should model both compute and operational overhead when comparing against closed API alternatives.

Best Use Cases

Strong fit for internal copilots, domain-specific automation, and budget-constrained environments that still require control over deployment and data boundaries.

Comparisons

Compared with Llama 4 Maverick, Scout favors lower cost and throughput over maximum quality. Compared with GPT-5 nano or Gemini 2.5 Flash-Lite, Scout provides more control but often needs more engineering investment to operate well.