“it says I can't run Qwen 3.5 on my machine, while it is running in the background currently, coding. So, not sure what the true value of a tool like this is other than getting a first glimpse. — mittermayr (https://news.ycombinator.com/item?id=47211830)”
You know that feeling when you want to run a local LLM and you're staring at a wall of GGUF files, quantization levels, and VRAM numbers with no idea what actually fits your hardware? You download a 7B model, realize your GPU doesn't have enough VRAM, try a smaller quantization, and spend an hour in trial-and-error hell. There's no single tool that looks at your specific machine — your GPU vendor, your RAM, your CPU cores — and tells you 'here's exactly what runs and how fast.'
Think of it like a fitness test for your computer's AI capabilities. llmfit detects your hardware via system utilities (nvidia-smi, rocm-smi, system_profiler) and loads an embedded database of ~206 HuggingFace models. For each model, it picks the best quantization level, estimates memory usage (including MoE expert offloading), and computes a multi-dimensional score (Quality/Speed/Fit/Context, each 0-100) weighted by your use case — Chat weights Speed at 0.35, Reasoning weights Quality at 0.55. Speed estimation uses a formula: (memory_bandwidth_GB_s / model_size_GB) × 0.55, validated against ~80 GPUs from published llama.cpp benchmarks. You get a ranked table via CLI, TUI, JSON, or REST API.
If you're a developer experimenting with local LLMs on consumer hardware and tired of guessing whether a model fits your GPU's VRAM, llmfit gives you a quick compatibility scan. Also useful if you're deciding between hardware upgrades and want to model-fit before buying. Not for you if you need real benchmark data — the tok/s estimates are theoretical, and the compile-time model database of ~206 models will always lag behind HuggingFace's full catalog.
Worth installing if you're getting started with local LLMs and want a quick hardware scan — `brew install llmfit && llmfit system` gives you immediate value. The project is at v0.9.8 with 85 releases and very active development (6 versions in 5 days in April 2026). Know the limitations: speed estimates are theoretical (memory-bandwidth formula with 0.55 efficiency factor), the ~206 model database is compile-time embedded so it lags new releases, and at least one HN user caught it wrongly claiming a model wouldn't run. Treat it as a first-pass filter, not a definitive answer.
Deep-dive insight, Easy and Pro modes, plus action playbooks — the full breakdown is one tap away.