meta-llama/Llama-4-Scout-17B-16E-Instruct

by meta

Pricing

Input $0.07 / 1M tokens
Output $0.24 / 1M tokens

LLaMA 4 Scout Release by Meta AI


Overview

LLaMA 4 Scout is a compact, highly efficient model in Meta’s LLaMA 4 family, released in April 2025. It offers massive context length, strong multimodal capabilities, and optimized performance on low-latency hardware.

  • Release Date: April 5, 2025
  • Multimodal Support: Text + Images (early fusion)
  • Context Length: 10 million tokens
  • Model Size: 109B total / 17B active (Mixture-of-Experts with 16 experts)

Features

  • Sparse Mixture-of-Experts architecture
  • Multilingual: trained on 12+ languages
  • High performance in reasoning, math, and coding
  • Native image understanding with early fusion
  • Optimized for low-cost inference on single H100 GPU

Benchmarks

Task LLaMA 4 Scout GPT-4o Mini Gemini Flash
MMLU (reasoning) ~75.2% 82.0% 77.9%
MGSM (math) 87.0% 79.7%
Context Handling 10M tokens 128K tokens 128K tokens

Note: Some benchmarks are community-reported; Meta has not officially published full evals.


Access & Pricing

Type Availability
Model Weights Free (LLaMA 4 Community License)
Inference Runs on single H100 GPU

Use Cases

  • On-device and edge inference
  • Fast document understanding and summarization
  • Lightweight image-based Q&A
  • Massive context document analysis
  • Multimodal assistant agents

Safety and Alignment

  • Fine-tuned with Meta’s alignment pipeline
  • Moderation filters and instruction following
  • Knowledge cutoff: August 2024

Limitations

  • No audio or video input/output
  • Limited published benchmarks
  • No native web browsing or plugin support

License

Released under the LLaMA 4 Community License. Commercial use is permitted within specified user limits (~700M MAUs).