Skip to content

jmdevita/Qwen3-TTS-FastAPI

Repository files navigation

Qwen3-TTS-FastAPI

OpenAI-compatible text-to-speech API server powered by Qwen3-TTS. Drop-in replacement for Kokoro-FastAPI or any OpenAI TTS integration.

Features

  • OpenAI-compatible API - Works with any client expecting OpenAI TTS endpoints
  • Multiple model variants - 0.6B and 1.7B models with different capabilities
  • Voice cloning - Clone any voice from a short audio sample
  • Voice design - Generate voices from text descriptions (1.7B-VoiceDesign)
  • Style instruction - Control tone and emotion (1.7B models)
  • Streaming - Chunked streaming for lower time-to-first-audio
  • Multiple formats - MP3, WAV, OPUS, FLAC, AAC, PCM
  • Web UI - Built-in browser interface for testing
  • Docker support - GPU and CPU deployment options

Quick Start

Docker (GPU)

git clone https://github.com/your-repo/Qwen3-TTS-FastAPI.git
cd Qwen3-TTS-FastAPI
cp .env.example .env
docker-compose up -d

Docker (CPU)

docker-compose -f docker-compose.cpu.yml up -d

Local

pip install -r requirements.txt
cp .env.example .env
cd app && python server.py

API Usage

Generate Speech

curl -X POST http://localhost:8880/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Hello, this is a test of the text to speech system.",
    "voice": "aiden",
    "response_format": "mp3"
  }' \
  --output speech.mp3

With Style Instruction (1.7B models)

curl -X POST http://localhost:8880/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "input": "I am so excited to share this news with you!",
    "voice": "aiden",
    "instruct": "Speak in a cheerful and energetic tone"
  }' \
  --output speech.mp3

Streaming Mode

curl -X POST http://localhost:8880/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "input": "This is a longer text that will be streamed...",
    "voice": "aiden",
    "stream": true
  }' \
  --output speech.pcm

Voice Cloning

First, upload a reference voice:

curl -X POST http://localhost:8880/upload_voice \
  -F "name=my-voice" \
  -F "ref_text=This is the transcript of my audio sample." \
  -F "file=@reference.wav"

Then use it:

curl -X POST http://localhost:8880/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Hello in my cloned voice!", "voice": "my-voice"}' \
  --output cloned.mp3

Available Voices

Voice Gender Description
aiden Male Sunny American
dylan Male Energetic American
eric Male Deep American
ryan Male Calm American
vivian Female Gentle Chinese-American
serena Female Elegant American
sohee Female Warm Korean
ono_anna Female Soft Japanese
uncle_fu Male Warm Chinese elder

Model Variants

Model Size Preset Voices Voice Clone Style Instruction Voice Design
0.6B-CustomVoice ~2GB Yes No No No
1.7B-CustomVoice ~4GB Yes No Yes No
1.7B-VoiceDesign ~4GB No No Yes Yes
0.6B-Base ~2GB No Yes No No
1.7B-Base ~4GB No Yes No No

Note: Voice cloning only works with Base models. CustomVoice models have preset speakers but cannot clone voices.

Set via MODEL_NAME environment variable:

  • 0.6B or 1.7B for CustomVoice variants (recommended)
  • Full HuggingFace path for specific variants

API Endpoints

Endpoint Method Description
/v1/audio/speech POST Generate speech (OpenAI-compatible)
/v1/audio/speech/clone POST Generate with voice cloning
/v1/audio/speech/design POST Generate with voice design
/v1/models/load POST Hot-swap to a different model
/upload_voice POST Upload custom voice
/v1/voices GET List available voices
/v1/models GET List models
/health GET Health check
/web/ GET Web UI
/docs GET API documentation

Configuration

Key environment variables:

Variable Default Description
MODEL_NAME 0.6B Model to load
DEVICE cuda:0 cuda:0, cuda:1, or cpu
PORT 8880 Server port
DEFAULT_VOICE aiden Default voice
DEFAULT_FORMAT mp3 Default audio format

See docs/SETUP.md for complete configuration options.

Web Interface

Access the built-in web UI at: http://localhost:8880/web/

Features:

  • Text input with character count
  • Voice selection with descriptions
  • Audio format selection
  • Speed control
  • Style instruction input (1.7B models)
  • Model switching (hot-swap without restart)
  • Voice upload for cloning
  • Audio playback and download

Integration Examples

Open WebUI

Configure in Open WebUI TTS settings:

  • Engine: OpenAI
  • Base URL: http://localhost:8880/v1
  • Voice: aiden

Python

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8880/v1", api_key="not-needed")

response = client.audio.speech.create(
    model="tts-1",
    voice="aiden",
    input="Hello from Python!"
)

response.stream_to_file("output.mp3")

JavaScript

const response = await fetch('http://localhost:8880/v1/audio/speech', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    input: 'Hello from JavaScript!',
    voice: 'aiden'
  })
});

const blob = await response.blob();

Documentation

  • Setup Guide - Installation and configuration
  • API Docs - Interactive API documentation (when running)

License

MIT

Acknowledgments

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors