GitHub Repos intermediate 3 min read May 25, 2026
Public Preview Sign in free for the full digest →

Dograh: The Open-Source Vapi Alternative You Run on Your Own Server

“Vapi raised $50M and closed self-hosting — Dograh is the BSD-2 clone that runs on your own server, free.”

Dograh: The Open-Source Vapi Alternative You Run on Your Own Server
1 Views
0 Likes
0 Bookmarks
Source · github.com

“"Dograh is an open source alternative to Vapi, not a clone though. Vapi/Retell are closed platforms; this is open source infra you self-host and modify. Like saying n8n is a clone of Zapier because they solve the same problem. Same category, but fundamentally different model." —...”

You know that feeling when your voice agent vendor charges per minute, routes your customers' audio through their infrastructure, and hands you a compliance checklist you cannot satisfy because the data never touched your servers? Every managed platform — Vapi, Retell, Bland — controls the audio pipeline, charges for usage, and owns the compliance certifications. Building the full stack yourself means stitching together Twilio webhooks, streaming STT, an LLM call loop, TTS synthesis, WebRTC signaling, and telephony provider SDKs — then maintaining all of it when any provider changes their API. Dograh wraps that entire integration in a Docker Compose setup with a visual workflow builder, so you own the deployment without assembling the plumbing from scratch.

voice-aiopen-sourcepythonself-hostedtelephonywebrtcfastapi

You define your voice agent as a directed graph in the UI — start node, LLM call nodes, tool call nodes, conditional edges — and Dograh runs that graph on top of Pipecat, an open-source real-time audio pipeline. When a call arrives via Twilio, Vonage, or WebRTC, Pipecat handles audio framing and feeds it to your configured STT provider (Deepgram, Speechmatics) to get a transcript. That transcript goes to your LLM (OpenAI, Gemini, OpenRouter), the response goes to your TTS provider (Cartesia, Dograh native TTS), and the synthesized audio goes back to the caller — end-to-end latency is 500–600ms on fast model configurations per the maintainer. The entire stack runs in Docker Compose on a server you control: `docker compose up` brings up the FastAPI backend and the Next.js UI at port 3010.

01
Multi-provider STT/LLM/TTS — you swap Deepgram for Speechmatics or OpenAI for Gemini without rewriting your agent logic, so a provider outage or price change does not force a rebuild
02
Docker Compose deployment — `docker compose up` brings the full backend and UI online in one command on any server you control, with the Next.js UI accessible at port 3010
03
Drag-and-drop workflow graph builder — you wire agent logic visually (start node → LLM node → conditional edge → tool call) instead of writing orchestration code, reducing time to first working agent
04
Telephony and WebRTC both supported — Twilio, Vonage, Telnyx, Cloudonix, and Asterisk ARI all plug into the same agent graph, so phone and browser calls share one codebase
05
MCP tool integration — v1.31.0 adds generic MCP (Model Context Protocol) tool sources with per-node function filtering, letting your agent call external services without writing custom connector code
06
Embeddings-based RAG knowledge base — attach a document corpus to your agent to ground responses in your own data, without building a separate retrieval pipeline
07
ElevenLabs Data Residency support — added in v1.29.0, routing TTS through a compliant endpoint for EU or HIPAA-adjacent audio processing without switching providers
Who it’s for

If you are building voice automation for BPO call centres, inbound support lines, or outbound dialing campaigns and you need call audio to stay on your own servers — Dograh gives you the managed-platform feature set under a BSD-2 license you can modify and ship. Also a fit if you want to avoid per-minute SaaS pricing at high call volume. Not the right pick if you need SOC 2/HIPAA/PCI compliance certifications out of the box (Vapi and Bland cover this), or if you need sub-400ms turn latency (Bland claims 400ms vs. Dograh's 500–600ms on fast models per the maintainer).

Worth exploring

Worth a serious look if your use case requires self-hosted voice infrastructure and you have engineering time to patch the three open security audit issues (#330, #331, #340) before going live. The core Docker setup, workflow builder, and provider integrations work — but ElevenLabs TTS is currently broken (issue #334), so use Cartesia or Deepgram TTS as your default stack. If you need production deployment without hands-on patching, wait 1–2 months for the security backlog to clear.

Developer playbook
Tech stack, code snippet, sentiment, alternatives.
PM playbook
Adoption angles, user fit, positioning.
CEO playbook
Traction signals, ROI, build vs buy.
Deep-dive insight
Full long-form analysis, no fluff.
Easy mode
Core idea, fast — when you need the gist.
Pro mode
Technical nuance, edge cases, tradeoffs.
Read the full digest
Go beyond the preview

Deep-dive insight, Easy and Pro modes, plus action playbooks — the full breakdown is one tap away.

Underrated tools. Unfiltered takes.

Read the full digest in the Snaplyze app for deep-dive insight, Easy and Pro modes, and the playbooks you can actually use.

Install Snaplyze →