ClueCon Weekly with Sean Dubois [Sn. 15 Ep. 19]: WebRTC, RTP, and the Latency War in AI Voice

ClueCon Weekly with Sean Dubois [Sn. 15 Ep. 19]: WebRTC, RTP, and the Latency War in AI Voice

In this episode of ClueCon Weekly, host Jon Gray sits down with Sean Dubois—creator of Pion and a developer at OpenAI—to talk about why WebRTC is becoming the go-to real-time transport for AI voice systems.

We dig into what’s actually happening inside a modern voice AI pipeline: audio transport, decode, voice activity detection (VAD), speech-to-text, LLM processing, and response generation and why every step adds latency. Shawn explains where WebRTC helps you “buy time back,” especially when it comes to responsiveness, packet loss, and avoiding delays that break the flow of conversation.

We also get into the practical debate: WebRTC vs WebSockets. WebSockets can be “good enough” for some teams, but Sean makes the case that WebRTC’s model (RTP, real-time delivery, and established patterns) avoids a lot of the custom protocol work—and the bugs—that come from rolling your own audio streaming approach.

Finally, we zoom out on open source and the AI era: how LLM-assisted coding is changing contribution patterns, why massive PRs are becoming harder to maintain, and what maintainers are seeing in the wild. Plus, Shawn shares a recent highlight he’s excited about: OBS merging WebRTC/SIMOcast support, making multi-bitrate streaming more accessible.

Topics covered:
▪️WebRTC as the transport layer for AI voice
▪️Latency, trust, and “natural” voice interactions
▪️Packet loss, resilience, and keeping calls responsive
▪️When WebSockets still make sense (and hybrid approaches)
▪️WebRTC on microcontrollers and lightweight builds
▪️Open source in the age of AI-generated code
▪️OBS + WebRTC updates (SIMOcast support)

If you’re building voice agents, real-time audio apps, or anything latency-sensitive, this one’s for you.