Voice mode

Headmaster supports real-time voice interaction — talk to the agent with your voice and hear responses spoken aloud.

Under the hood

Voice mode uses:

Speech-to-text (STT): Your spoken audio is transcribed to text and sent to the agent as a message. Headmaster uses a local faster-whisper model by default, or OpenAI Whisper for higher accuracy.
Text-to-speech (TTS): The agent’s text response is converted to speech audio and played back. Headmaster uses OpenAI TTS, xAI, MiniMax, or ElevenLabs as TTS providers.

The cycle is: you speak → STT transcribes → agent processes → TTS speaks the response → you speak again.

Enabling voice mode

Open Settings → My Headmaster → Look → Voice.
Turn on Enable voice mode.
Choose a TTS provider and voice.
Choose an STT backend (local faster-whisper or OpenAI Whisper).
Save.

TTS providers

Provider	Voices	Notes
OpenAI	Alloy, Echo, Fable, Onyx, Nova, Shimmer	Natural, high quality. Requires OpenAI key.
xAI	Various	Requires xAI key.
MiniMax	Various	Requires MiniMax key.
ElevenLabs	5k-40k voice options	Premium quality. Requires ElevenLabs key.
Edge	Built-in voices	Free, no API key needed. Lower quality.

STT backends

Backend	Quality	Notes
Local faster-whisper	Good	Free, runs locally, no API key. Default.
OpenAI Whisper	High	Requires OpenAI key. Better accuracy for accents and noise.

Using voice mode

In the desktop app

Click the microphone icon in the chat composer. The icon turns red to indicate recording. Speak your message, then click the icon again (or press Esc) to stop recording. The agent transcribes your speech, processes it, and speaks the response.

Push to talk

Enable Push to talk in voice settings. Hold the microphone button (or a keyboard shortcut) to talk, release to send. The agent responds with speech automatically.

Continuous conversation

In continuous mode, the agent listens for your speech, responds, then automatically listens again. You don’t need to click the microphone each time — just talk. Enable in Settings → Voice → Continuous mode.

Interrupting

While the agent is speaking, click the stop button or press Esc to interrupt. The agent stops speaking and the partial audio is discarded. You can then speak a new message.

Voice on messaging platforms

On Telegram and Discord, voice messages you send are transcribed and processed. The agent’s response is sent as text (or as a voice message if TTS is enabled for that platform). Send a voice message to your Headmaster bot on Telegram → the agent transcribes it, processes it, and responds. If TTS is enabled, the response comes back as a voice message.

Voice settings

Setting	What it controls
TTS provider	Which service generates speech
TTS voice	Which voice to use
TTS speed	How fast the agent speaks (0.5x to 2x)
STT backend	Which service transcribes your speech
Auto-listen	Start listening automatically after the agent responds
Push to talk	Hold to talk, release to send
Continuous mode	Agent listens → responds → listens again
Voice volume	Output volume for TTS audio

Speech input button

The microphone button in the chat composer shows the current voice state:

Gray mic — voice mode is off. Click to start recording.
Red mic — recording in progress. Click to stop and send.
Blue mic — processing. The agent is transcribing or generating speech.
Green mic — speaking. The agent is speaking the response.

Tips for better voice interaction

Speak clearly — the STT model works best with clear, moderate-paced speech.
Use a quiet environment — background noise reduces transcription accuracy.
Try different voices — some TTS voices sound more natural for your use case. Try them all.
Use push to talk — prevents the agent from picking up background conversation as input.
Adjust speed — if the agent speaks too fast or slow, adjust the TTS speed setting.

​Voice mode

​Under the hood

​Enabling voice mode

​TTS providers

​STT backends

​Using voice mode

​In the desktop app

​Push to talk

​Continuous conversation

​Interrupting

​Voice on messaging platforms

​Voice settings

​Speech input button

​Tips for better voice interaction