LLMfit - If you have a computer install it now to see what models you can run

What problem does it solve

“it says I can't run Qwen 3.5 on my machine, while it is running in the background currently, coding. So, not sure what the true value of a tool like this is other than getting a first glimpse. — mittermayr (https://news.ycombinator.com/item?id=47211830)”

You know that feeling when you want to run a local LLM and you're staring at a wall of GGUF files, quantization levels, and VRAM numbers with no idea what actually fits your hardware? You download a 7B model, realize your GPU doesn't have enough VRAM, try a smaller quantization, and spend an hour in trial-and-error hell. There's no single tool that looks at your specific machine — your GPU vendor, your RAM, your CPU cores — and tells you 'here's exactly what runs and how fast.'

aiopen-sourcerustclillmdevtoolslocal-ai

How it works

Think of it like a fitness test for your computer's AI capabilities. llmfit detects your hardware via system utilities (nvidia-smi, rocm-smi, system_profiler) and loads an embedded database of ~206 HuggingFace models. For each model, it picks the best quantization level, estimates memory usage (including MoE expert offloading), and computes a multi-dimensional score (Quality/Speed/Fit/Context, each 0-100) weighted by your use case — Chat weights Speed at 0.35, Reasoning weights Quality at 0.55. Speed estimation uses a formula: (memory_bandwidth_GB_s / model_size_GB) × 0.55, validated against ~80 GPUs from published llama.cpp benchmarks. You get a ranked table via CLI, TUI, JSON, or REST API.

Key takeaways

✦

01

Multi-vendor hardware detection — why you care: it auto-detects NVIDIA, AMD, Intel, Apple Silicon, and Ascend NPUs via vendor-specific tools, so you get accurate specs without manually looking up your GPU model and VRAM.

⟁

02

Multi-dimensional scoring (Quality/Speed/Fit/Context, each 0-100) — why you care: you get use-case-weighted rankings instead of a binary fits/doesn't-fit, so a Chat use case prioritizes speed while a Reasoning task prioritizes output quali...

⊕

03

Dynamic quantization selection — why you care: the tool automatically picks the best quantization level (Q4_K_M, Q5_K_M, etc.) for your hardware rather than making you research GGUF quantization tiers yourself.

◈

04

MoE architecture support — why you care: Mixtral 8x7B has 46.7B total params but only activates ~12.9B per token; llmfit accounts for expert offloading, showing you can run it in ~6.6 GB VRAM instead of the naive 23.9 GB estimate.

∞

05

REST API mode (llmfit serve) — why you care: you can integrate hardware-aware model selection into a cluster scheduler or CI pipeline, querying endpoints like /api/v1/models/top with filters for fit level, runtime, and provider.

◎

06

Hardware simulation mode — why you care: you can test model fit on hypothetical hardware specs (press S in TUI) before buying a GPU, giving you a purchase-decision tool.

✺

07

5 runtime provider support — why you care: covers Ollama, llama.cpp, MLX, Docker Model Runner, and LM Studio, so your model recommendations match whichever inference engine you actually use.

Should you care?

Who it’s for

If you're a developer experimenting with local LLMs on consumer hardware and tired of guessing whether a model fits your GPU's VRAM, llmfit gives you a quick compatibility scan. Also useful if you're deciding between hardware upgrades and want to model-fit before buying. Not for you if you need real benchmark data — the tok/s estimates are theoretical, and the compile-time model database of ~206 models will always lag behind HuggingFace's full catalog.

Worth exploring

Worth installing if you're getting started with local LLMs and want a quick hardware scan — `brew install llmfit && llmfit system` gives you immediate value. The project is at v0.9.8 with 85 releases and very active development (6 versions in 5 days in April 2026). Know the limitations: speed estimates are theoretical (memory-bandwidth formula with 0.55 efficiency factor), the ~206 model database is compile-time embedded so it lags new releases, and at least one HN user caught it wrongly claiming a model wouldn't run. Treat it as a first-pass filter, not a definitive answer.

6 more sections · unlock free

Developer playbook

Tech stack, code snippet, sentiment, alternatives.

PM playbook

Adoption angles, user fit, positioning.

CEO playbook

Traction signals, ROI, build vs buy.

Deep-dive insight

Full long-form analysis, no fluff.

Easy mode

Core idea, fast — when you need the gist.

Pro mode

Technical nuance, edge cases, tradeoffs.

Sign in free — unlock all 6

LLMfit - If you have a computer install it now to see what models you can run

Underrated tools. Unfiltered takes.