“Six months ago this needed an RTX 5090. Now it runs on your M3 Pro.”
Six months ago, running real-time voice AI locally required an RTX 5090. Today, Parlor does it on an M3 Pro using Google's new 2.6GB Gemma 4 E2B model. It's an open-source app that lets you talk to your computer and show it things via camera — everything runs locally with 2.5-3 second end-to-end latency. The project launched April 3-6, 2026 and already has 705 GitHub stars, but carries a 'research preview' warning with known security gaps.
You know that feeling when you want to build a voice AI feature but cloud APIs cost $20/month per user and send all your data to someone else's servers? Or when you see OpenAI's multimodal demos and think 'that's exactly what I need' but realize it requires their infrastructure? Running real-time voice+vision AI locally used to demand a desktop GPU that costs more than a used car. Parlor exists because its creator runs a free English-learning voice AI service and needed to eliminate server costs entirely.
Your browser captures microphone audio and camera frames, sending them over WebSocket to a local FastAPI server. The server feeds audio and JPEG images into Gemma 4 E2B (Google's 2.3B parameter multimodal model) via LiteRT-LM, which understands both speech and vision simultaneously. The model generates text responses, which Kokoro TTS converts to speech — streaming sentence-by-sentence back to your browser. Silero VAD in the browser detects when you're speaking so you don't need push-to-talk, and barge-in lets you interrupt the AI mid-sentence.
If you're a developer curious about on-device AI who wants to see what's now possible on laptop hardware, this is your demo. Also relevant if you're building privacy-first applications or need zero-marginal-cost voice AI. Not for you if you need production-ready code (security issues exist), Windows support (LiteRT-LM doesn't support it), or agentic coding capabilities (creator explicitly says it...
Yes, but strictly as an experiment. The project is 4 days old (April 3-6, 2026) with a 'research preview' label and 3 open issues including security vulnerabilities. What makes it worth your time: it proves real-time multimodal AI now runs on laptop-class hardware. The 705 GitHub stars in days show genuine developer interest. Try it to understand what's now possible locally, but don't build on it yet — wait for security patches and broader platform support.
View original sourceThis page gives you the hook. The full Snaplyze digest goes deeper so you can move from curiosity to decision with less noise.
Open the full digest to read the deeper breakdown, compare viewpoints, and get the practical next-step playbooks.
Read the full digest for deep-dive insight, Easy Mode, Pro Mode, and practical playbooks you can actually use.
Install Snaplyze