meta-llama/Llama-4-Scout-17B-16E-Instruct

Overview

LLaMA 4 Scout is a compact, highly efficient model in Meta’s LLaMA 4 family, released in April 2025. It offers massive context length, strong multimodal capabilities, and optimized performance on low-latency hardware.

Release Date: April 5, 2025
Multimodal Support: Text + Images (early fusion)
Context Length: 10 million tokens
Model Size: 109B total / 17B active (Mixture-of-Experts with 16 experts)

Features

Sparse Mixture-of-Experts architecture
Multilingual: trained on 12+ languages
High performance in reasoning, math, and coding
Native image understanding with early fusion
Optimized for low-cost inference on single H100 GPU

Benchmarks

Task	LLaMA 4 Scout	GPT-4o Mini	Gemini Flash
MMLU (reasoning)	~75.2%	82.0%	77.9%
MGSM (math)	—	87.0%	79.7%
Context Handling	10M tokens	128K tokens	128K tokens

Note: Some benchmarks are community-reported; Meta has not officially published full evals.

Access & Pricing

Type	Availability
Model Weights	Free (LLaMA 4 Community License)
Inference	Runs on single H100 GPU

Use Cases

On-device and edge inference
Fast document understanding and summarization
Lightweight image-based Q&A
Massive context document analysis
Multimodal assistant agents

Safety and Alignment

Fine-tuned with Meta’s alignment pipeline
Moderation filters and instruction following
Knowledge cutoff: August 2024

Limitations

No audio or video input/output
Limited published benchmarks
No native web browsing or plugin support

License

Released under the LLaMA 4 Community License. Commercial use is permitted within specified user limits (~700M MAUs).