Intelligent Ollama Model Selector
AI-powered CLI that analyzes your hardware and recommends optimal LLM models
Deterministic scoring across 35+ curated models with hardware-calibrated memory estimation
Installation • Quick Start • Claude MCP • Commands • Scoring • Hardware
Choosing the right LLM for your hardware is complex. With thousands of model variants, quantization levels, and hardware configurations, finding the optimal model requires understanding memory bandwidth, VRAM limits, and performance characteristics.
LLM Checker solves this. It analyzes your system, scores every compatible model across four dimensions (Quality, Speed, Fit, Context), and delivers actionable recommendations in seconds.
| Feature | Description | |
|---|---|---|
| 35+ | Curated Models | Hand-picked catalog covering all major families and sizes (1B-32B) |
| 4D | Scoring Engine | Quality, Speed, Fit, Context — weighted by use case |
| Multi-GPU | Hardware Detection | Apple Silicon, NVIDIA CUDA, AMD ROCm, Intel Arc, CPU |
| Calibrated | Memory Estimation | Bytes-per-parameter formula validated against real Ollama sizes |
| Zero | Native Dependencies | Pure JavaScript — works on any Node.js 16+ system |
| Optional | SQLite Search | Install sql.js to unlock sync, search, and smart-recommend |
# Install globally
npm install -g llm-checker
# Or run directly with npx
npx llm-checker hw-detectRequirements:
- Node.js 16+ (any version: 16, 18, 20, 22, 24)
- Ollama installed for running models
Optional: For database search features (sync, search, smart-recommend):
npm install sql.js# 1. Detect your hardware capabilities
llm-checker hw-detect
# 2. Get full analysis with compatible models
llm-checker check
# 3. Get intelligent recommendations by category
llm-checker recommend
# 4. (Optional) Sync full database and search
llm-checker sync
llm-checker search qwen --use-case codingLLM Checker includes a built-in Model Context Protocol (MCP) server, allowing Claude Code and other MCP-compatible AI assistants to analyze your hardware and manage local models directly.
# Install globally first
npm install -g llm-checker
# Add to Claude Code
claude mcp add llm-checker -- llm-checker-mcpOr with npx (no global install needed):
claude mcp add llm-checker -- npx llm-checker-mcpRestart Claude Code and you're done.
Once connected, Claude can use these tools:
Core Analysis:
| Tool | Description |
|---|---|
hw_detect |
Detect your hardware (CPU, GPU, RAM, acceleration backend) |
check |
Full compatibility analysis with all models ranked by score |
recommend |
Top model picks by category (coding, reasoning, multimodal, etc.) |
installed |
Rank your already-downloaded Ollama models |
search |
Search the Ollama model catalog with filters |
smart_recommend |
Advanced recommendations using the full scoring engine |
Ollama Management:
| Tool | Description |
|---|---|
ollama_list |
List all downloaded models with params, quant, family, and size |
ollama_pull |
Download a model from the Ollama registry |
ollama_run |
Run a prompt against a local model (with tok/s metrics) |
ollama_remove |
Delete a model to free disk space |
Advanced (MCP-exclusive):
| Tool | Description |
|---|---|
ollama_optimize |
Generate optimal Ollama env vars for your hardware (NUM_GPU, PARALLEL, FLASH_ATTENTION, etc.) |
benchmark |
Benchmark a model with 3 standardized prompts — measures tok/s, load time, prompt eval |
compare_models |
Head-to-head comparison of two models on the same prompt with speed + response side-by-side |
cleanup_models |
Analyze installed models — find redundancies, cloud-only models, oversized models, and upgrade candidates |
project_recommend |
Scan a project directory (languages, frameworks, size) and recommend the best model for that codebase |
ollama_monitor |
Real-time system status: RAM usage, loaded models, memory headroom analysis |
After setup, you can ask Claude things like:
- "What's the best coding model for my hardware?"
- "Benchmark qwen2.5-coder and show me the tok/s"
- "Compare llama3.2 vs codellama for coding tasks"
- "Clean up my Ollama — what should I remove?"
- "What model should I use for this Rust project?"
- "Optimize my Ollama config for maximum performance"
- "How much RAM is Ollama using right now?"
Claude will automatically call the right tools and give you actionable results.
| Command | Description |
|---|---|
hw-detect |
Detect GPU/CPU capabilities, memory, backends |
check |
Full system analysis with compatible models and recommendations |
recommend |
Intelligent recommendations by category (coding, reasoning, multimodal, etc.) |
installed |
Rank your installed Ollama models by compatibility |
| Command | Description |
|---|---|
sync |
Download the latest model catalog from Ollama registry |
search <query> |
Search models with filters and intelligent scoring |
smart-recommend |
Advanced recommendations using the full scoring engine |
| Command | Description |
|---|---|
ai-check |
AI-powered model evaluation with meta-analysis |
ai-run |
AI-powered model selection and execution |
llm-checker hw-detectSummary:
Apple M4 Pro (24GB Unified Memory)
Tier: MEDIUM HIGH
Max model size: 15GB
Best backend: metal
CPU:
Apple M4 Pro
Cores: 12 (12 physical)
SIMD: NEON
Metal:
GPU Cores: 16
Unified Memory: 24GB
Memory Bandwidth: 273GB/s
llm-checker recommendINTELLIGENT RECOMMENDATIONS BY CATEGORY
Hardware Tier: HIGH | Models Analyzed: 205
Coding:
qwen2.5-coder:14b (14B)
Score: 78/100
Command: ollama pull qwen2.5-coder:14b
Reasoning:
deepseek-r1:14b (14B)
Score: 86/100
Command: ollama pull deepseek-r1:14b
Multimodal:
llama3.2-vision:11b (11B)
Score: 83/100
Command: ollama pull llama3.2-vision:11b
llm-checker search llama -l 5
llm-checker search coding --use-case coding
llm-checker search qwen --quant Q4_K_M --max-size 8| Option | Description |
|---|---|
-l, --limit <n> |
Number of results (default: 10) |
-u, --use-case <type> |
Optimize for: general, coding, chat, reasoning, creative, fast |
--max-size <gb> |
Maximum model size in GB |
--quant <type> |
Filter by quantization: Q4_K_M, Q8_0, FP16, etc. |
--family <name> |
Filter by model family |
The built-in catalog includes 35+ models from the most popular Ollama families:
| Family | Models | Best For |
|---|---|---|
| Qwen 2.5/3 | 7B, 14B, Coder 7B/14B/32B, VL 3B/7B | Coding, general, vision |
| Llama 3.x | 1B, 3B, 8B, Vision 11B | General, chat, multimodal |
| DeepSeek | R1 8B/14B/32B, Coder V2 16B | Reasoning, coding |
| Phi-4 | 14B | Reasoning, math |
| Gemma 2 | 2B, 9B | General, efficient |
| Mistral | 7B, Nemo 12B | Creative, chat |
| CodeLlama | 7B, 13B | Coding |
| LLaVA | 7B, 13B | Vision |
| Embeddings | nomic-embed-text, mxbai-embed-large, bge-m3, all-minilm | RAG, search |
Models are automatically combined with any locally installed Ollama models for scoring.
Models are evaluated across four dimensions, weighted by use case:
| Dimension | Description |
|---|---|
| Q Quality | Model family reputation + parameter count + quantization penalty |
| S Speed | Estimated tokens/sec based on hardware backend and model size |
| F Fit | Memory utilization efficiency (how well it fits in available RAM) |
| C Context | Context window capability vs. target context length |
Three scoring systems are available, each optimized for different workflows:
Deterministic Selector (primary — used by check and recommend):
| Category | Quality | Speed | Fit | Context |
|---|---|---|---|---|
general |
45% | 35% | 15% | 5% |
coding |
55% | 20% | 15% | 10% |
reasoning |
60% | 10% | 20% | 10% |
multimodal |
50% | 15% | 20% | 15% |
Scoring Engine (used by smart-recommend and search):
| Use Case | Quality | Speed | Fit | Context |
|---|---|---|---|---|
general |
40% | 35% | 15% | 10% |
coding |
55% | 20% | 15% | 10% |
reasoning |
60% | 15% | 10% | 15% |
chat |
40% | 40% | 15% | 5% |
fast |
25% | 55% | 15% | 5% |
quality |
65% | 10% | 15% | 10% |
All weights are centralized in src/models/scoring-config.js.
Memory requirements are calculated using calibrated bytes-per-parameter values:
| Quantization | Bytes/Param | 7B Model | 14B Model | 32B Model |
|---|---|---|---|---|
| Q8_0 | 1.05 | ~8 GB | ~16 GB | ~35 GB |
| Q4_K_M | 0.58 | ~5 GB | ~9 GB | ~20 GB |
| Q3_K | 0.48 | ~4 GB | ~8 GB | ~17 GB |
The selector automatically picks the best quantization that fits your available memory.
Apple Silicon
- M1, M1 Pro, M1 Max, M1 Ultra
- M2, M2 Pro, M2 Max, M2 Ultra
- M3, M3 Pro, M3 Max
- M4, M4 Pro, M4 Max
NVIDIA (CUDA)
- RTX 50 Series (5090, 5080, 5070 Ti, 5070)
- RTX 40 Series (4090, 4080, 4070 Ti, 4070, 4060 Ti, 4060)
- RTX 30 Series (3090 Ti, 3090, 3080 Ti, 3080, 3070 Ti, 3070, 3060 Ti, 3060)
- Data Center (H100, A100, A10, L40, T4)
AMD (ROCm)
- RX 7900 XTX, 7900 XT, 7800 XT, 7700 XT
- RX 6900 XT, 6800 XT, 6800
- Instinct MI300X, MI300A, MI250X, MI210
Intel
- Arc A770, A750, A580, A380
- Integrated Iris Xe, UHD Graphics
CPU Backends
- AVX-512 + AMX (Intel Sapphire Rapids, Emerald Rapids)
- AVX-512 (Intel Ice Lake+, AMD Zen 4)
- AVX2 (Most modern x86 CPUs)
- ARM NEON (Apple Silicon, AWS Graviton, Ampere Altra)
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Hardware │────>│ Model │────>│ Deterministic │
│ Detection │ │ Catalog (35+) │ │ Selector │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
Detects GPU/CPU JSON catalog + 4D scoring
Memory / Backend Installed models Per-category weights
Usable memory calc Auto-dedup Memory calibration
│
v
┌─────────────────┐
│ Ranked │
│ Recommendations│
└─────────────────┘
Selector Pipeline:
- Hardware profiling — CPU, GPU, RAM, acceleration backend
- Model pool — Merge catalog + installed Ollama models (deduped)
- Category filter — Keep models relevant to the use case
- Quantization selection — Best quant that fits in memory budget
- 4D scoring — Q, S, F, C with category-specific weights
- Ranking — Top N candidates returned
Detect your hardware:
llm-checker hw-detectGet recommendations for all categories:
llm-checker recommendFull system analysis with compatible models:
llm-checker checkFind the best coding model:
llm-checker recommend --category codingSearch for small, fast models under 5GB:
llm-checker search "7b" --max-size 5 --use-case fastGet high-quality reasoning models:
llm-checker smart-recommend --use-case reasoninggit clone https://github.com/Pavelevich/llm-checker.git
cd llm-checker
npm install
node bin/enhanced_cli.js hw-detectsrc/
models/
deterministic-selector.js # Primary selection algorithm
scoring-config.js # Centralized scoring weights
scoring-engine.js # Advanced scoring (smart-recommend)
catalog.json # Curated model catalog (35+ models)
ai/
multi-objective-selector.js # Multi-objective optimization
ai-check-selector.js # LLM-based evaluation
hardware/
detector.js # Hardware detection
unified-detector.js # Cross-platform detection
data/
model-database.js # SQLite storage (optional)
sync-manager.js # Database sync from Ollama registry
bin/
enhanced_cli.js # CLI entry point
MIT License — see LICENSE for details.
