The Small Model Squeeze

Why budget LLMs face an existential competitive threat

"One big problem with small models is that the cheaper BIG models are just unbelievably inexpensive - realistically the competition is things like Gemini 2.5 Flash-Lite and that model will process 1 billion tokens for $100"

Gemini 2.5 Flash-Lite

$75
per 1 billion input tokens

True "Small" Models

$35-50
ministral-3b, llama-3.1-8b, nova-micro

Price Premium for "Small"

1.5-2x
more expensive than the savings suggest

Capability Gap

10-50x
parameter count difference (3B vs 100B+)

📊 The Competitive Landscape (1B input + 100M output tokens)

Model Tier Total Cost Cost Visualization
ministral-3b Tiny (3B) $44
$44
gemini-1.5-flash-8b Tiny (8B) $53
$53
llama-3.1-8b Tiny (8B) $58
$58
gemini-2.5-flash-lite ⭐ Capable (~100B+) $105
$105
gpt-4.1-nano Small $140
$140
gemini-2.5-flash ⭐ Capable (~100B+) $210
$210
claude-3-haiku Small (~20B) $375
$375
deepseek-v3 Capable (MoE) $380
$380
gpt-4.1-mini Small-Mid $560
$560
claude-haiku-4 Small-Mid $1,200
$1,200
Tiny Models (3-8B params)
$44-58
ministral-3b, llama-8b, flash-8b
← 💀 →
Capable "Lite" Models
$105
gemini-2.5-flash-lite

🎯 Key Implications

🔮 Strategic Takeaways

Analysis generated using LLM Cost Analysis Skill • Prices as of 2025-06