Voice Augmentation
An agent that passes every test under clean audio conditions may fail the same tests under realistic ones. Voice benchmarking research consistently shows a pass-rate drop between clean and noisy environments, and the drop isn't just about speech recognition accuracy. Background noise, interruptions, and cross-talk test whether the agent's conversation logic holds up: turn detection, recovery from barge-in, tool-calling sequences under garbled input, and handling off-mic speech.
Augmentation is how you create those conditions. It injects realistic audio effects (noise, interruptions, side conversations) into your voice runs so you can run the same test suite under controlled adversity.
What Augmentation Does
Each augmentation strategy modifies one aspect of the audio or conversation flow during the simulation:
| Strategy | What it does | When to use |
|---|---|---|
noise | Mixes background noise (cafeteria, traffic, etc.) into the caller's audio at a configured SNR | Validate ASR robustness; baseline of every realistic run |
barge_in | The caller interrupts the agent mid-reply with a short prompt | Test the agent's interruption handling and recovery |
backchannel | Caller utters a short acknowledgement ("mm-hmm", "right") while the agent speaks | Test the agent's turn-detection logic |
directed_speech | Caller turns away mid-call to address someone in their environment | Test the agent against off-mic cross-talk |
secondary_speaker | A second voice in the room interjects with on-topic commentary | Test multi-speaker robustness and confusion handling |
cap | Driver emits multiple consecutive messages per turn (Concurrent Ask Probability) | Stress-test agent behavior when the caller sends a burst of messages |
Composition Rule
Augmentation is configured with the augmentation parameter of okareo.run_simulation(...). Composition is constrained:
You may use at most one strategy, or
noisecombined with one other strategy.
Valid combos: noise alone, barge_in alone, noise + barge_in, noise + backchannel, etc. Invalid: barge_in + backchannel, or any three strategies together.
Configuring Augmentation
In the App
In the simulation form, once a voice target is selected, open Advanced Settings and scroll to Voice Simulation Settings:
-
Background Noise: Select a profile from the dropdown (Cafeteria, Classroom, Office Babble, Traffic, or None), then set the Signal to Noise Ratio (-5 to 25 dB). Lower SNR means more noise relative to the speaker.
-
Voice Augmentation: Pick one strategy from the chips: Barge-In, Backchannel, Directed Speech, Secondary Speaker, or Concurrent Ask. Each strategy shows its own parameter fields when selected.
-
Background noise and the augmentation strategy are independent controls. Combine noise with one strategy (e.g. noise + barge-in); you cannot select two strategies at once.

Some parameter defaults differ between the UI form and the server/SDK. Where they differ, the strategy parameter tables below note the server defaults. If you need the same behavior across both paths, set values explicitly.
From the SDK
Pass an Augmentation container to okareo.run_simulation(...). Each strategy has a typed config class in okareo.augmentations; set one strategy field, or noise plus one other.
from okareo.augmentations import Augmentation, BargeInAugmentation, NoiseAugmentation
result = okareo.run_simulation(
name="Noise + Barge-In",
target=target,
scenario=scenario,
driver=driver,
max_turns=5,
first_turn="driver",
checks=["avg_turn_taking_latency", "result_completed"],
augmentation=Augmentation(
noise=NoiseAugmentation(profile="cafeteria", snr_db=10),
barge_in=BargeInAugmentation(
probability=0.5,
min_offset_ms=200,
max_offset_ms=600,
prompt="Ask for a very short polite interruption.",
),
),
)
print(f"Results: {result.app_link}")
The strategy classes are NoiseAugmentation, BargeInAugmentation, BackchannelAugmentation, DirectedSpeechAugmentation, SecondarySpeakerAugmentation, and CAPAugmentation, matching the strategy keys in the table above.
Strategy Parameters
Parameter names, types, and defaults below are from the server strategy constructors (the source of truth for what the API accepts).
noise
| Parameter | Type | Valid values | Default |
|---|---|---|---|
noise_profile | string | cafeteria, classroom, office_babble, traffic | required (no default) |
noise_snr_db | number | -5 to 25 | 10 |
seed | int or null | Any integer for reproducibility | null |
Lower SNR = more noise relative to the speaker. 10 is moderate; try 0 to 5 for stress conditions.
The UI form defaults noise_profile to cafeteria, but the API requires you to specify it explicitly. Omitting it returns a 400 error.
Aliases accepted: profile → noise_profile, snr_db → noise_snr_db. The SDK's NoiseAugmentation(profile=..., snr_db=...) uses the alias names.
barge_in
| Parameter | Type | Range / Notes | Default |
|---|---|---|---|
prompt | string | LLM instruction for the interruption phrasing | required |
probability | number | 0 to 1 (per agent turn) | 0.2 |
min_offset_ms | int | 0 or above | 200 |
max_offset_ms | int | at least min_offset_ms | 600 |
seed | int or null | Reproducibility | null |
Probability is per agent reply; min_offset_ms / max_offset_ms is the delay after the agent starts speaking before the interruption fires.
backchannel
| Parameter | Type | Range / Notes | Default |
|---|---|---|---|
utterance | string | Non-empty after trim | mm-hmm |
probability | number | 0 to 1 | 0.35 |
min_offset_ms | int | 0 or above | 150 |
max_offset_ms | int | at least min_offset_ms | 450 |
seed | int or null | Reproducibility | null |
Backchannels repeat while the target is speaking. They do not consume a turn.
The UI form uses different offset defaults (1000/10000). The values above are the server defaults. When calling via SDK, specify offsets explicitly if you want the wider window.
directed_speech
| Parameter | Type | Range / Notes | Default |
|---|---|---|---|
prompt | string or null | LLM instruction for the off-mic remark | (built-in template) |
probability | number | 0 to 1 | 0.3 |
lpf_cutoff_hz | number | 1 to 20000 | 800 |
gain_db | number | -40 to 0 | -8 |
Low-pass filter and gain attenuation simulate the muffled, off-mic quality of someone turning their head away from the phone.
secondary_speaker
| Parameter | Type | Range / Notes | Default |
|---|---|---|---|
secondary_prompt | string or null | LLM instruction for the second speaker's interjection | (built-in template) |
secondary_voice | string | Named voice from Okareo's voice catalog | Cathy - Coworker |
probability | number | 0 to 1 | 0.3 |
inter_speaker_pause_ms | int | 0 to 5000 | 120 |
lpf_cutoff_hz | number or null | 1 to 20000 (null disables) | 800 |
gain_db | number | -40 to 0 | -8 |
Aliases accepted: prompt → secondary_prompt, voice → secondary_voice. The SDK's SecondarySpeakerAugmentation(voice=..., prompt=...) uses the alias names.
The UI form defaults to Carson - Curious Conversationalist. The server default is Cathy - Coworker. Specify secondary_voice explicitly when calling via SDK.
cap
| Parameter | Type | Range / Notes | Default |
|---|---|---|---|
probability | number | 0 to 1 | 0.3 |
pause_ms | int or null | 0 to 10000, maps to turn_transition_time on the target | 1000 |
When the probability gate fires, the driver emits multiple consecutive messages in a single turn. The number of messages comes from the driver's get_default_consecutive_messages() setting (typically 2). The pause_ms controls the gap between the driver finishing and the target responding.
Reading Augmented Results
Compare the augmented run's mean_scores against a clean baseline. The expected pattern:
avg_turn_taking_latencyrises as the agent works harder against noise or interruptions.result_completedmay drop if the agent can't handle the augmented conditions.response_loopflips to0if the agent gets stuck after an interruption.

For a structured comparison between baseline and augmented runs, see Experimentation and A/B Testing.
Where to Go Next
- Voice Checks: checks that respond to augmented conditions.
- Experimentation and A/B Testing: diff augmented vs baseline runs.
- Load Testing: combine augmentation with concurrency.
Full runnable script: 06_augmentation.py