meta-llama/Llama-4-Scout-17B-16E-Instruct
by meta
Pricing
Input
$0.07 / 1M tokens
Output
$0.24 / 1M tokens
LLaMA 4 Scout Release by Meta AI
Overview
LLaMA 4 Scout is a compact, highly efficient model in Meta’s LLaMA 4 family, released in April 2025. It offers massive context length, strong multimodal capabilities, and optimized performance on low-latency hardware.
- Release Date: April 5, 2025
- Multimodal Support: Text + Images (early fusion)
- Context Length: 10 million tokens
- Model Size: 109B total / 17B active (Mixture-of-Experts with 16 experts)
Features
- Sparse Mixture-of-Experts architecture
- Multilingual: trained on 12+ languages
- High performance in reasoning, math, and coding
- Native image understanding with early fusion
- Optimized for low-cost inference on single H100 GPU
Benchmarks
Task | LLaMA 4 Scout | GPT-4o Mini | Gemini Flash |
---|---|---|---|
MMLU (reasoning) | ~75.2% | 82.0% | 77.9% |
MGSM (math) | — | 87.0% | 79.7% |
Context Handling | 10M tokens | 128K tokens | 128K tokens |
Note: Some benchmarks are community-reported; Meta has not officially published full evals.
Access & Pricing
Type | Availability |
---|---|
Model Weights | Free (LLaMA 4 Community License) |
Inference | Runs on single H100 GPU |
Use Cases
- On-device and edge inference
- Fast document understanding and summarization
- Lightweight image-based Q&A
- Massive context document analysis
- Multimodal assistant agents
Safety and Alignment
- Fine-tuned with Meta’s alignment pipeline
- Moderation filters and instruction following
- Knowledge cutoff: August 2024
Limitations
- No audio or video input/output
- Limited published benchmarks
- No native web browsing or plugin support
License
Released under the LLaMA 4 Community License. Commercial use is permitted within specified user limits (~700M MAUs).