Qwen3-TTS-FastAPI

OpenAI-compatible text-to-speech API server powered by Qwen3-TTS. Drop-in replacement for Kokoro-FastAPI or any OpenAI TTS integration.

Features

OpenAI-compatible API - Works with any client expecting OpenAI TTS endpoints
Multiple model variants - 0.6B and 1.7B models with different capabilities
Voice cloning - Clone any voice from a short audio sample
Voice design - Generate voices from text descriptions (1.7B-VoiceDesign)
Style instruction - Control tone and emotion (1.7B models)
Streaming - Chunked streaming for lower time-to-first-audio
Multiple formats - MP3, WAV, OPUS, FLAC, AAC, PCM
Web UI - Built-in browser interface for testing
Docker support - GPU and CPU deployment options

Quick Start

Docker (GPU)

git clone https://github.com/your-repo/Qwen3-TTS-FastAPI.git
cd Qwen3-TTS-FastAPI
cp .env.example .env
docker-compose up -d

Docker (CPU)

docker-compose -f docker-compose.cpu.yml up -d

Local

pip install -r requirements.txt
cp .env.example .env
cd app && python server.py

API Usage

Generate Speech

curl -X POST http://localhost:8880/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Hello, this is a test of the text to speech system.",
    "voice": "aiden",
    "response_format": "mp3"
  }' \
  --output speech.mp3

With Style Instruction (1.7B models)

curl -X POST http://localhost:8880/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "input": "I am so excited to share this news with you!",
    "voice": "aiden",
    "instruct": "Speak in a cheerful and energetic tone"
  }' \
  --output speech.mp3

Streaming Mode

curl -X POST http://localhost:8880/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "input": "This is a longer text that will be streamed...",
    "voice": "aiden",
    "stream": true
  }' \
  --output speech.pcm

Voice Cloning

First, upload a reference voice:

curl -X POST http://localhost:8880/upload_voice \
  -F "name=my-voice" \
  -F "ref_text=This is the transcript of my audio sample." \
  -F "file=@reference.wav"

Then use it:

curl -X POST http://localhost:8880/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Hello in my cloned voice!", "voice": "my-voice"}' \
  --output cloned.mp3

Available Voices

Voice	Gender	Description
aiden	Male	Sunny American
dylan	Male	Energetic American
eric	Male	Deep American
ryan	Male	Calm American
vivian	Female	Gentle Chinese-American
serena	Female	Elegant American
sohee	Female	Warm Korean
ono_anna	Female	Soft Japanese
uncle_fu	Male	Warm Chinese elder

Model Variants

Model	Size	Preset Voices	Voice Clone	Style Instruction	Voice Design
0.6B-CustomVoice	~2GB	Yes	No	No	No
1.7B-CustomVoice	~4GB	Yes	No	Yes	No
1.7B-VoiceDesign	~4GB	No	No	Yes	Yes
0.6B-Base	~2GB	No	Yes	No	No
1.7B-Base	~4GB	No	Yes	No	No

Note: Voice cloning only works with Base models. CustomVoice models have preset speakers but cannot clone voices.

Set via MODEL_NAME environment variable:

0.6B or 1.7B for CustomVoice variants (recommended)
Full HuggingFace path for specific variants

API Endpoints

Endpoint	Method	Description
`/v1/audio/speech`	POST	Generate speech (OpenAI-compatible)
`/v1/audio/speech/clone`	POST	Generate with voice cloning
`/v1/audio/speech/design`	POST	Generate with voice design
`/v1/models/load`	POST	Hot-swap to a different model
`/upload_voice`	POST	Upload custom voice
`/v1/voices`	GET	List available voices
`/v1/models`	GET	List models
`/health`	GET	Health check
`/web/`	GET	Web UI
`/docs`	GET	API documentation

Configuration

Key environment variables:

Variable	Default	Description
`MODEL_NAME`	`0.6B`	Model to load
`DEVICE`	`cuda:0`	`cuda:0`, `cuda:1`, or `cpu`
`PORT`	`8880`	Server port
`DEFAULT_VOICE`	`aiden`	Default voice
`DEFAULT_FORMAT`	`mp3`	Default audio format

See docs/SETUP.md for complete configuration options.

Web Interface

Access the built-in web UI at: http://localhost:8880/web/

Features:

Text input with character count
Voice selection with descriptions
Audio format selection
Speed control
Style instruction input (1.7B models)
Model switching (hot-swap without restart)
Voice upload for cloning
Audio playback and download

Integration Examples

Open WebUI

Configure in Open WebUI TTS settings:

Engine: OpenAI
Base URL: http://localhost:8880/v1
Voice: aiden

Python

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8880/v1", api_key="not-needed")

response = client.audio.speech.create(
    model="tts-1",
    voice="aiden",
    input="Hello from Python!"
)

response.stream_to_file("output.mp3")

JavaScript

const response = await fetch('http://localhost:8880/v1/audio/speech', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    input: 'Hello from JavaScript!',
    voice: 'aiden'
  })
});

const blob = await response.blob();

Documentation

Setup Guide - Installation and configuration
API Docs - Interactive API documentation (when running)

License

MIT

Acknowledgments

Qwen3-TTS - The underlying TTS model
qwen-tts - Python package for Qwen3-TTS

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
app		app
docs		docs
ui		ui
voices		voices
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.cpu		Dockerfile.cpu
README.md		README.md
docker-compose.cpu.yml		docker-compose.cpu.yml
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Qwen3-TTS-FastAPI

Features

Quick Start

Docker (GPU)

Docker (CPU)

Local

API Usage

Generate Speech

With Style Instruction (1.7B models)

Streaming Mode

Voice Cloning

Available Voices

Model Variants

API Endpoints

Configuration

Web Interface

Integration Examples

Open WebUI

Python

JavaScript

Documentation

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Qwen3-TTS-FastAPI

Features

Quick Start

Docker (GPU)

Docker (CPU)

Local

API Usage

Generate Speech

With Style Instruction (1.7B models)

Streaming Mode

Voice Cloning

Available Voices

Model Variants

API Endpoints

Configuration

Web Interface

Integration Examples

Open WebUI

Python

JavaScript

Documentation

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages