Voice Augmentation

An agent that passes every test under clean audio conditions may fail the same tests under realistic ones. Voice benchmarking research consistently shows a pass-rate drop between clean and noisy environments, and the drop isn't just about speech recognition accuracy. Background noise, interruptions, and cross-talk test whether the agent's conversation logic holds up: turn detection, recovery from barge-in, tool-calling sequences under garbled input, and handling off-mic speech.

Augmentation is how you create those conditions. It injects realistic audio effects (noise, interruptions, side conversations) into your voice runs so you can run the same test suite under controlled adversity.

What Augmentation Does

Each augmentation strategy modifies one aspect of the audio or conversation flow during the simulation:

Strategy	What it does	When to use
`noise`	Mixes background noise (cafeteria, traffic, etc.) into the caller's audio at a configured SNR	Validate ASR robustness; baseline of every realistic run
`barge_in`	The caller interrupts the agent mid-reply with a short prompt	Test the agent's interruption handling and recovery
`backchannel`	Caller utters a short acknowledgement ("mm-hmm", "right") while the agent speaks	Test the agent's turn-detection logic
`directed_speech`	Caller turns away mid-call to address someone in their environment	Test the agent against off-mic cross-talk
`secondary_speaker`	A second voice in the room interjects with on-topic commentary	Test multi-speaker robustness and confusion handling
`cap`	Driver emits multiple consecutive messages per turn (Concurrent Ask Probability)	Stress-test agent behavior when the caller sends a burst of messages

Composition Rule

Augmentation is configured with the augmentation parameter of okareo.run_simulation(...). Composition is constrained:

You may use at most one strategy, or noise combined with one other strategy.

Valid combos: noise alone, barge_in alone, noise + barge_in, noise + backchannel, etc. Invalid: barge_in + backchannel, or any three strategies together.

Configuring Augmentation

In the App

In the simulation form, once a voice target is selected, open Advanced Settings and scroll to Voice Simulation Settings:

Background Noise: Select a profile from the dropdown (Cafeteria, Classroom, Office Babble, Traffic, or None), then set the Signal to Noise Ratio (-5 to 25 dB). Lower SNR means more noise relative to the speaker.
Voice Augmentation: Pick one strategy from the chips: Barge-In, Backchannel, Directed Speech, Secondary Speaker, or Concurrent Ask. Each strategy shows its own parameter fields when selected.
Background noise and the augmentation strategy are independent controls. Combine noise with one strategy (e.g. noise + barge-in); you cannot select two strategies at once.

UI vs SDK defaults

Some parameter defaults differ between the UI form and the server/SDK. Where they differ, the strategy parameter tables below note the server defaults. If you need the same behavior across both paths, set values explicitly.

From the SDK

Pass an Augmentation container to okareo.run_simulation(...). Each strategy has a typed config class in okareo.augmentations; set one strategy field, or noise plus one other.

from okareo.augmentations import Augmentation, BargeInAugmentation, NoiseAugmentation

result = okareo.run_simulation(
    name="Noise + Barge-In",
    target=target,
    scenario=scenario,
    driver=driver,
    max_turns=5,
    first_turn="driver",
    checks=["avg_turn_taking_latency", "result_completed"],
    augmentation=Augmentation(
        noise=NoiseAugmentation(profile="cafeteria", snr_db=10),
        barge_in=BargeInAugmentation(
            probability=0.5,
            min_offset_ms=200,
            max_offset_ms=600,
            prompt="Ask for a very short polite interruption.",
        ),
    ),
)
print(f"Results: {result.app_link}")

The strategy classes are NoiseAugmentation, BargeInAugmentation, BackchannelAugmentation, DirectedSpeechAugmentation, SecondarySpeakerAugmentation, and CAPAugmentation, matching the strategy keys in the table above.

Strategy Parameters

Parameter names, types, and defaults below are from the server strategy constructors (the source of truth for what the API accepts).

`noise`

Parameter	Type	Valid values	Default
`noise_profile`	string	`cafeteria`, `classroom`, `office_babble`, `traffic`	required (no default)
`noise_snr_db`	number	-5 to 25	`10`
`seed`	int or null	Any integer for reproducibility	`null`

Lower SNR = more noise relative to the speaker. 10 is moderate; try 0 to 5 for stress conditions.

tip

The UI form defaults noise_profile to cafeteria, but the API requires you to specify it explicitly. Omitting it returns a 400 error.

Aliases accepted: profile → noise_profile, snr_db → noise_snr_db. The SDK's NoiseAugmentation(profile=..., snr_db=...) uses the alias names.

`barge_in`

Parameter	Type	Range / Notes	Default
`prompt`	string	LLM instruction for the interruption phrasing	required
`probability`	number	0 to 1 (per agent turn)	`0.2`
`min_offset_ms`	int	0 or above	`200`
`max_offset_ms`	int	at least `min_offset_ms`	`600`
`seed`	int or null	Reproducibility	`null`

Probability is per agent reply; min_offset_ms / max_offset_ms is the delay after the agent starts speaking before the interruption fires.

`backchannel`

Parameter	Type	Range / Notes	Default
`utterance`	string	Non-empty after trim	`mm-hmm`
`probability`	number	0 to 1	`0.35`
`min_offset_ms`	int	0 or above	`150`
`max_offset_ms`	int	at least `min_offset_ms`	`450`
`seed`	int or null	Reproducibility	`null`

Backchannels repeat while the target is speaking. They do not consume a turn.

tip

The UI form uses different offset defaults (1000/10000). The values above are the server defaults. When calling via SDK, specify offsets explicitly if you want the wider window.

`directed_speech`

Parameter	Type	Range / Notes	Default
`prompt`	string or null	LLM instruction for the off-mic remark	(built-in template)
`probability`	number	0 to 1	`0.3`
`lpf_cutoff_hz`	number	1 to 20000	`800`
`gain_db`	number	-40 to 0	`-8`

Low-pass filter and gain attenuation simulate the muffled, off-mic quality of someone turning their head away from the phone.

`secondary_speaker`

Parameter	Type	Range / Notes	Default
`secondary_prompt`	string or null	LLM instruction for the second speaker's interjection	(built-in template)
`secondary_voice`	string	Named voice from Okareo's voice catalog	`Cathy - Coworker`
`probability`	number	0 to 1	`0.3`
`inter_speaker_pause_ms`	int	0 to 5000	`120`
`lpf_cutoff_hz`	number or null	1 to 20000 (null disables)	`800`
`gain_db`	number	-40 to 0	`-8`

Aliases accepted: prompt → secondary_prompt, voice → secondary_voice. The SDK's SecondarySpeakerAugmentation(voice=..., prompt=...) uses the alias names.

tip

The UI form defaults to Carson - Curious Conversationalist. The server default is Cathy - Coworker. Specify secondary_voice explicitly when calling via SDK.

`cap`

Parameter	Type	Range / Notes	Default
`probability`	number	0 to 1	`0.3`
`pause_ms`	int or null	0 to 10000, maps to `turn_transition_time` on the target	`1000`

When the probability gate fires, the driver emits multiple consecutive messages in a single turn. The number of messages comes from the driver's get_default_consecutive_messages() setting (typically 2). The pause_ms controls the gap between the driver finishing and the target responding.

Reading Augmented Results

Compare the augmented run's mean_scores against a clean baseline. The expected pattern:

avg_turn_taking_latency rises as the agent works harder against noise or interruptions.
result_completed may drop if the agent can't handle the augmented conditions.
response_loop flips to 0 if the agent gets stuck after an interruption.

Augmented run results showing the impact of noise and barge-in on check scores

For a structured comparison between baseline and augmented runs, see Experimentation and A/B Testing.

Where to Go Next

Voice Checks: checks that respond to augmented conditions.
Experimentation and A/B Testing: diff augmented vs baseline runs.
Load Testing: combine augmentation with concurrency.

Cookbook

Full runnable script: 06_augmentation.py

What Augmentation Does​

Composition Rule​

Configuring Augmentation​

In the App​

From the SDK​

Strategy Parameters​

noise​

barge_in​

backchannel​

directed_speech​

secondary_speaker​

cap​

Reading Augmented Results​

Where to Go Next​