Griber is an open-source speech-to-text API compatible with OpenAI's Realtime API, built using Golang and WebSockets. It supports multiple ASR providers, including sherpa-onnx and mock for testing.
- OpenAI Compatible: Implements the OpenAI Realtime API protocol.
- Modular ASR: Support for different ASR backends (currently we only support sherpa-onnx, other providers will be coming soon).
- Configurable: Full control via
config.yamland environment variables.
- Go 1.21+
- ONNX Runtime libraries (for Sherpa-onnx)
- Clone the repository:
git clone https://github.com/aira-id/griber.git cd griber - Install dependencies:
go mod download
- Setup ASR models (see Model Setup below).
Griber requires pre-trained ONNX models to perform speech-to-text. By default, it looks for models in the models/ directory.
Download the streaming Zipformer models from Hugging Face:
- Indonesian (ID): sherpa-onnx-streaming-zipformer2-id
- English (EN): sherpa-onnx-streaming-zipformer-en-2023-06-26
# Create models directory if it doesn't exist
mkdir -p models
# Download Indonesian model
git clone https://huggingface.co/spacewave/sherpa-onnx-streaming-zipformer2-id
# Download English model
git clone https://huggingface.co/csukuangfj/sherpa-onnx-streaming-zipformer-en-2023-06-26Your structure should look like this:
griber/
├── models/
│ ├── sherpa-onnx-streaming-zipformer2-id/
│ │ ├── encoder-iter-100000-avg-15-chunk-32-left-256.onnx
│ │ ├── decoder-iter-100000-avg-15-chunk-32-left-256.onnx
│ │ ├── joiner-iter-100000-avg-15-chunk-32-left-256.onnx
│ │ └── tokens.txt
│ └── sherpa-onnx-streaming-zipformer-en-2023-06-26/
│ ├── encoder-epoch-99-avg-1-chunk-16-left-128.onnx
│ ├── decoder-epoch-99-avg-1-chunk-16-left-128.onnx
│ ├── joiner-epoch-99-avg-1-chunk-16-left-128.onnx
│ └── tokens.txt
└── config.yaml
Ensure the file names in config.yaml match the files you downloaded.
go run main.goThe server will start on port 8080 (default).
Griber uses config.yaml for main configuration. Environment variables can also be used for most settings.
ws://localhost:8080/v1/realtime
This server implements the OpenAI Realtime API protocol for speech-to-text. For detailed documentation on client events, server events, and message formats, please refer to the official OpenAI Realtime API Documentation.
Use standard OpenAI-compatible credentials and SDKs (where applicable) or raw WebSockets to interact with this endpoint.
A simple web client is provided in the client/ directory to demonstrate real-time transcription.
- Start the Griber server:
go run main.go
- Open
client/index.htmlin your web browser. - Allow microphone access when prompted.
- Click "Connect" to start the session and begin speaking.
- You should see real-time transcription results.
Note: Ensure the server is running on localhost:8080 (default).
- Confidence Scores: Griber currently uses
sherpa-onnx'sOnlineRecognizerfor real-time transcription, which does not generate confidence scores. We plan to integrateOfflineRecognizerin the future to support this feature. - API Protocols: Only the WebSocket API endpoint is currently supported. Support for other protocols (e.g., HTTP REST and WebRTC) will be added in upcoming releases.
- Language Support: Language availability depends on the specific models currently configured and loaded.
We welcome contributions from the community! Here are a few ways you can help improve Griber:
- Testing & Compatibility: Help us ensure high compatibility with the OpenAI Realtime API by testing with various clients and scenarios.
- Reporting Issues: Found a bug or have a suggestion? Open an issue to let us know.
- Submitting Pull Requests: Contributions are always welcome! Feel free to fork the repository and submit PRs for bug fixes or enhancements.
- Adding New Features: We're looking to expand support for other ASR providers, TTS engines, and additional API protocols.
- Improving Documentation: Help us make Griber easier to use by improving guides, examples, and API references.
This project is licensed under the MIT License - see the LICENSE file for details.