Voice Settings – Sapience Cloud

Voice Settings & Troubleshooting

Media support in browsers has evolved over time, and is complicated now by there being material differences between browsers on Desktop versus mobile browsers.

Rest assured, Sapience takes care of 99% of the complexity for you, and will choose sensible default settings based upon the device you are on, and the browser you are using.

For those of you that want to tweak the voice settings we use for dictation, you can access them in Menu > Settings & Customization > Advanced > Voice Settings.

💡

Most users should not touch these settings. For the audiophiles among you - go to town.

*Voice settings windows, which can be found under Menu > Settings & Customization > Advanced > Voice Settings.*

Voice Recording Details

Configure how audio is recorded and processed for voice transcription.

Transcription Model

Whisper: this is used widely in the industry, for example by Whisper Flow and any number of AI voice startups. Good in noisy environments, which most users are in (typing on keyboard, background air conditioning, etc).

GPT Transcribe: wider array of languages supported, and fewer errors if you have a pristine audio environment. Less tolerant.

Microphone Settings

Input Sample Rate

The frequency at which audio is captured from your microphone.

Option	Description	Recommendation
16 kHz	Optimized for speech recognition	Recommended - Whisper/GPT models are trained on 16kHz audio
44.1 kHz	CD quality audio	Unnecessary for speech; larger files
48 kHz	Professional audio standard	Unnecessary for speech; larger files

Audio Channels

Whether to record in mono (single channel) or stereo (dual channel).

Option	Description	Recommendation
Mono	Single audio channel	Recommended - Speech only needs one channel; smaller files
Stereo	Left and right channels	Only useful for music or spatial audio

Echo Cancellation

Reduces echo from speakers being picked up by the microphone.

Enable if you're not using headphones and speakers might create feedback

Recommended: ON - The transcription API does not do this processing

Noise Suppression

Reduces background noise (fans, traffic, ambient sounds).

Enable for noisy environments

Recommended: ON - The transcription API does not do this processing

Auto Gain Control

Automatically adjusts microphone volume to maintain consistent levels.

Enable to normalize volume if you speak at varying distances from the mic

Recommended: ON - Helps ensure consistent audio levels

Recording Output

Output Format

The audio format used for the recorded file.

Format	Codec	File Size	API Compatibility	Recommendation
MP3	MPEG Layer 3	Small	Best	Recommended - Most reliable with gpt-4o-transcribe
WebM	Opus	Smallest	Unreliable	May cause "invalid format" errors with newer models
WAV	PCM (uncompressed)	Large	Good	Works but creates unnecessarily large files

Why MP3? OpenAI's gpt-4o-transcribe model has stricter format requirements than whisper-1. WebM/Opus files frequently cause "invalid file format" errors. MP3 provides the best balance of compatibility and file size.

Output Sample Rate

The sample rate of the recorded audio file.

Should match Input Sample Rate for best quality

Recommended: 16 kHz - Matches what transcription models expect

Buffer Size

Size of the audio processing buffer. Affects latency vs. stability.

Option	Trade-off
4096	Lower latency, may drop audio on slower devices
8192	Balanced
16384	Recommended - Most stable, slight latency increase

Output Bitrate

Overall bitrate for the encoded audio.

Option	Quality	File Size
64 kbps	Lower	Smallest
96 kbps	Good	Small
128 kbps	Recommended	Balanced
192 kbps	High	Larger

Encoder Bitrate

Internal encoder bitrate (primarily affects WebM/Opus encoding).

Recommended: 96 kbps - Good quality for speech

MP3 Encoding

MP3 Encoder Bitrate

Bitrate used when converting audio to MP3 format.

Option	Use Case
64 kbps	Minimize file size, acceptable quality
96 kbps	Good balance
128 kbps	Recommended - Clear speech quality
192 kbps	High quality, larger files

Transcription

Convert to MP3 Before Sending

Automatically converts audio to MP3 before sending to the transcription API.

Enable if using WebM or WAV output format and experiencing API errors

Not needed if Output Format is already set to MP3

Adds slight processing time but ensures API compatibility

Recommended Configuration

For the most reliable voice transcription experience:

Microphone:
  Input Sample Rate: 16 kHz
  Channels: Mono
  Echo Cancellation: ON
  Noise Suppression: ON
  Auto Gain Control: ON

Recording Output:
  Output Format: MP3
  Output Sample Rate: 16 kHz
  Buffer Size: 16384
  Output Bitrate: 128 kbps
  Encoder Bitrate: 96 kbps

MP3 Encoding:
  MP3 Encoder Bitrate: 128 kbps

Transcription:
  Convert to MP3 Before Sending: OFF (not needed when format is MP3)

Troubleshooting

"Invalid file format" errors

Change Output Format to MP3

Or enable Convert to MP3 Before Sending

Audio sounds choppy or has gaps

Increase Buffer Size to 16384

Close other browser tabs using the microphone

Transcription misses words or is inaccurate

Enable Noise Suppression to reduce background noise

Ensure Input Sample Rate is 16 kHz

Speak clearly and at a consistent distance from the microphone

Recording fails to start

Check browser permissions for microphone access

Ensure no other application is using the microphone

Try refreshing the page