Hackathon Project: Real-time voice+vision AI runs on your laptop
Snaplyze Digest
GitHub Repos intermediate 2 min read Apr 7, 2026 Updated Apr 15, 2026

Hackathon Project: Real-time voice+vision AI runs on your laptop

“Six months ago this needed an RTX 5090. Now it runs on your M3 Pro.”

In Short

Six months ago, running real-time voice AI locally required an RTX 5090. Today, Parlor does it on an M3 Pro using Google's new 2.6GB Gemma 4 E2B model. It's an open-source app that lets you talk to your computer and show it things via camera — everything runs locally with 2.5-3 second end-to-end latency. The project launched April 3-6, 2026 and already has 705 GitHub stars, but carries a 'research preview' warning with known security gaps.

aiopen-sourcepythonllmvoice-assistant
Why It Matters
The practical pain point this digest is really about.

You know that feeling when you want to build a voice AI feature but cloud APIs cost $20/month per user and send all your data to someone else's servers? Or when you see OpenAI's multimodal demos and think 'that's exactly what I need' but realize it requires their infrastructure? Running real-time voice+vision AI locally used to demand a desktop GPU that costs more than a used car. Parlor exists because its creator runs a free English-learning voice AI service and needed to eliminate server costs entirely.

How It Works
The mechanism, architecture, or workflow behind it.

Your browser captures microphone audio and camera frames, sending them over WebSocket to a local FastAPI server. The server feeds audio and JPEG images into Gemma 4 E2B (Google's 2.3B parameter multimodal model) via LiteRT-LM, which understands both speech and vision simultaneously. The model generates text responses, which Kokoro TTS converts to speech — streaming sentence-by-sentence back to your browser. Silero VAD in the browser detects when you're speaking so you don't need push-to-talk, and barge-in lets you interrupt the AI mid-sentence.

Key Takeaways
7 fast bullets that make the core value obvious.
  • Hands-free voice detection — Silero VAD automatically knows when you're speaking, no button-pressing required
  • Barge-in support — interrupt the AI mid-sentence just like a real conversation
  • Sentence-level TTS streaming — audio starts playing before the full response finishes generating, cutting perceived latency
  • On-device multimodal — Gemma 4 E2B processes speech and vision together without cloud calls
  • Cross-platform TTS — uses MLX acceleration on Mac, ONNX on Linux for fast text-to-speech
  • Auto-model download — pulls 2.6GB Gemma model from HuggingFace on first run, no manual setup
  • Real-time vision — camera feeds JPEG frames directly to the model for live object recognition and discussion
Should You Care?
Audience fit, decision signal, and the original source in one place.

Who It Is For

If you're a developer curious about on-device AI who wants to see what's now possible on laptop hardware, this is your demo. Also relevant if you're building privacy-first applications or need zero-marginal-cost voice AI. Not for you if you need production-ready code (security issues exist), Windows support (LiteRT-LM doesn't support it), or agentic coding capabilities (creator explicitly says it...

Worth Exploring?

Yes, but strictly as an experiment. The project is 4 days old (April 3-6, 2026) with a 'research preview' label and 3 open issues including security vulnerabilities. What makes it worth your time: it proves real-time multimodal AI now runs on laptop-class hardware. The 705 GitHub stars in days show genuine developer interest. Try it to understand what's now possible locally, but don't build on it yet — wait for security patches and broader platform support.

View original source
What the full digest unlocks

There is more here than the public preview.

This page gives you the hook. The full Snaplyze digest goes deeper so you can move from curiosity to decision with less noise.

Open the full digest to read the deeper breakdown, compare viewpoints, and get the practical next-step playbooks.

Open the full digest

Snaplyze

Go beyond the preview

Read the full digest for deep-dive insight, Easy Mode, Pro Mode, and practical playbooks you can actually use.

Install Snaplyze