OpenAI-compatible text-to-speech API server powered by Qwen3-TTS. Drop-in replacement for Kokoro-FastAPI or any OpenAI TTS integration.
- OpenAI-compatible API - Works with any client expecting OpenAI TTS endpoints
- Multiple model variants - 0.6B and 1.7B models with different capabilities
- Voice cloning - Clone any voice from a short audio sample
- Voice design - Generate voices from text descriptions (1.7B-VoiceDesign)
- Style instruction - Control tone and emotion (1.7B models)
- Streaming - Chunked streaming for lower time-to-first-audio
- Multiple formats - MP3, WAV, OPUS, FLAC, AAC, PCM
- Web UI - Built-in browser interface for testing
- Docker support - GPU and CPU deployment options
git clone https://github.com/your-repo/Qwen3-TTS-FastAPI.git
cd Qwen3-TTS-FastAPI
cp .env.example .env
docker-compose up -ddocker-compose -f docker-compose.cpu.yml up -dpip install -r requirements.txt
cp .env.example .env
cd app && python server.pycurl -X POST http://localhost:8880/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "Hello, this is a test of the text to speech system.",
"voice": "aiden",
"response_format": "mp3"
}' \
--output speech.mp3curl -X POST http://localhost:8880/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "I am so excited to share this news with you!",
"voice": "aiden",
"instruct": "Speak in a cheerful and energetic tone"
}' \
--output speech.mp3curl -X POST http://localhost:8880/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "This is a longer text that will be streamed...",
"voice": "aiden",
"stream": true
}' \
--output speech.pcmFirst, upload a reference voice:
curl -X POST http://localhost:8880/upload_voice \
-F "name=my-voice" \
-F "ref_text=This is the transcript of my audio sample." \
-F "file=@reference.wav"Then use it:
curl -X POST http://localhost:8880/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"input": "Hello in my cloned voice!", "voice": "my-voice"}' \
--output cloned.mp3| Voice | Gender | Description |
|---|---|---|
| aiden | Male | Sunny American |
| dylan | Male | Energetic American |
| eric | Male | Deep American |
| ryan | Male | Calm American |
| vivian | Female | Gentle Chinese-American |
| serena | Female | Elegant American |
| sohee | Female | Warm Korean |
| ono_anna | Female | Soft Japanese |
| uncle_fu | Male | Warm Chinese elder |
| Model | Size | Preset Voices | Voice Clone | Style Instruction | Voice Design |
|---|---|---|---|---|---|
| 0.6B-CustomVoice | ~2GB | Yes | No | No | No |
| 1.7B-CustomVoice | ~4GB | Yes | No | Yes | No |
| 1.7B-VoiceDesign | ~4GB | No | No | Yes | Yes |
| 0.6B-Base | ~2GB | No | Yes | No | No |
| 1.7B-Base | ~4GB | No | Yes | No | No |
Note: Voice cloning only works with Base models. CustomVoice models have preset speakers but cannot clone voices.
Set via MODEL_NAME environment variable:
0.6Bor1.7Bfor CustomVoice variants (recommended)- Full HuggingFace path for specific variants
| Endpoint | Method | Description |
|---|---|---|
/v1/audio/speech |
POST | Generate speech (OpenAI-compatible) |
/v1/audio/speech/clone |
POST | Generate with voice cloning |
/v1/audio/speech/design |
POST | Generate with voice design |
/v1/models/load |
POST | Hot-swap to a different model |
/upload_voice |
POST | Upload custom voice |
/v1/voices |
GET | List available voices |
/v1/models |
GET | List models |
/health |
GET | Health check |
/web/ |
GET | Web UI |
/docs |
GET | API documentation |
Key environment variables:
| Variable | Default | Description |
|---|---|---|
MODEL_NAME |
0.6B |
Model to load |
DEVICE |
cuda:0 |
cuda:0, cuda:1, or cpu |
PORT |
8880 |
Server port |
DEFAULT_VOICE |
aiden |
Default voice |
DEFAULT_FORMAT |
mp3 |
Default audio format |
See docs/SETUP.md for complete configuration options.
Access the built-in web UI at: http://localhost:8880/web/
Features:
- Text input with character count
- Voice selection with descriptions
- Audio format selection
- Speed control
- Style instruction input (1.7B models)
- Model switching (hot-swap without restart)
- Voice upload for cloning
- Audio playback and download
Configure in Open WebUI TTS settings:
- Engine: OpenAI
- Base URL:
http://localhost:8880/v1 - Voice:
aiden
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8880/v1", api_key="not-needed")
response = client.audio.speech.create(
model="tts-1",
voice="aiden",
input="Hello from Python!"
)
response.stream_to_file("output.mp3")const response = await fetch('http://localhost:8880/v1/audio/speech', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
input: 'Hello from JavaScript!',
voice: 'aiden'
})
});
const blob = await response.blob();- Setup Guide - Installation and configuration
- API Docs - Interactive API documentation (when running)
MIT