Skip to content

Pipecat: Getting interruptions to work

A client project with a voice agent built on Pipecat, Deepgram STT, Cartesia TTS, and OpenAI for the LLM. The agent needed to handle interruptions - stop talking when the user speaks.

Starting over

My initial attempt was overengineered. Custom VAD parameters, an Azure OpenAI client setup, a custom LLM service subclass, and RTVI components that weren't doing anything:

# ...
vad_analyzer=SileroVADAnalyzer(
    params=VADParams(
        stop_secs=0.5,
        start_secs=0.1,
        min_volume=0.4,
    )
),

# ...

azure_client = AsyncOpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    base_url=f"{os.getenv('AZURE_OPENAI_ENDPOINT')}/openai/deployments/...",
    default_query={"api-version": "2024-10-21"},
)

# ...

from pipecat.processors.frameworks.rtvi import RTVIConfig, RTVIObserver, RTVIProcessor
rtvi = RTVIProcessor(config=RTVIConfig(config=[]))
task = PipelineTask(pipeline, params=..., observers=[RTVIObserver(rtvi)])

# ...

class CustomLLMService(OpenAILLMService):
    def __init__(self, agent, backend_client):
        super().__init__(...)
        self.agent = agent
# ...

I went back to Pipecat's examples and built a minimal bot following the run_bot() pattern:

async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    session_id = os.getenv("BOT_SESSION_ID", "default")

    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))

    tts = CartesiaHttpTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
        voice_id=os.getenv("CARTESIA_VOICE_ID", "..."),
    )

    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

    pipeline = Pipeline([
        transport.input(),
        stt,
        context_aggregator.user(),
        llm,
        tts,
        transport.output(),
        context_aggregator.assistant(),
    ])

Interruptions

I used Pipecat's built-in MinWordsInterruptionStrategy, StartInterruptionFrame and StopInterruptionFrame.

An issue appeared: database writes were cancelled on interruption. The agent was calling tool functions to update the frontend, and those were being killed mid-write. Fix was cancel_on_interruption=False:

from pipecat.audio.interruptions.min_words_interruption_strategy import (
    MinWordsInterruptionStrategy,
)
from pipecat.frames.frames import (
    StartInterruptionFrame,
    StopInterruptionFrame,
)

llm.register_function(
    "update_item",
    function_handler.update_item,
    cancel_on_interruption=False,
)
llm.register_function(
    "complete_item",
    function_handler.complete_item,
    cancel_on_interruption=False,
)

Still didn't work. I removed the custom VAD parameters in favour of Pipecat's defaults, which didn't help either:

vad_analyzer=SileroVADAnalyzer(),

The actual problem

Turns out the issue was Cartesia's TTS service. Pipecat has two variants: CartesiaTTSService (WebSocket, streams audio progressively) and CartesiaHttpTTSService (HTTP, returns complete audio chunks). The streaming variant doesn't allow for clean interruptions - audio frames are already in flight. The HTTP variant sends complete chunks that can be discarded:

from pipecat.services.cartesia.tts import CartesiaTTSService, CartesiaHttpTTSService
tts = CartesiaHttpTTSService(
    api_key=os.getenv("CARTESIA_API_KEY"),
    voice_id=os.getenv("CARTESIA_VOICE_ID", "..."),
)

After switching to the HTTP variant and dropping the unused RTVI components (Daily transport and @pipecat-ai/client-js were used directly), I had a working MVP with clean interruptions.


This dates back to August 2025. Pipecat has since matured and the approach described here is being deprecated in favour of user turn strategies. Documentation is significantly better now as well.

S.D.G