-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
feat(whisperx): add whisperx backend for transcription with speaker diarization #8299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for localai ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
4dcf358 to
7bf3852
Compare
|
Hi @mudler I have updated the code based on your feedback. |
3e5133d to
0d10ffb
Compare
|
Hi @mudler hope you're having good weekend |
mudler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, I will test as well on my setup once on master. Thanks!
|
CI seems to fail ( error looks genuine ) |
Head branch was pushed to by a user without write access
dad9ac3 to
17d5edb
Compare
Add speaker field to the gRPC TranscriptSegment message and map it through the Go schema, enabling backends to return speaker labels. Signed-off-by: eureka928 <[email protected]>
Add Python gRPC backend using WhisperX for speech-to-text with word-level timestamps, forced alignment, and speaker diarization via pyannote-audio when HF_TOKEN is provided. Signed-off-by: eureka928 <[email protected]>
Signed-off-by: eureka928 <[email protected]>
Signed-off-by: eureka928 <[email protected]>
Signed-off-by: eureka928 <[email protected]>
…ments Address review feedback: - Use --extra-index-url for CPU torch wheels to reduce size - Remove torch version pins, let uv resolve compatible versions Signed-off-by: eureka928 <[email protected]>
Signed-off-by: eureka928 <[email protected]>
17d5edb to
3f93840
Compare
Can you run the CI again? |
|
Hi @mudler this fail isn't from my code update, it's pre-existing issue unrelated to this PR. |
Description
This PR adds a new WhisperX Python backend that provides transcription with speaker diarization (identifying who is speaking), word-level timestamps, and forced alignment via pyannote-audio.
Closes #3375
Key changes:
TranscriptSegmentmessage with aspeakerfield (backward-compatible — existing backends leave it empty)Speakerfield through the Go schema (core/schema/transcription.go) and backend mapper (core/backend/transcript.go)backend/python/whisperx/backend with gRPC server, requirements for CPU/CUDA 12/CUDA 13/ROCm, and unit testsbackend/index.yaml, and CI workflowSpeaker diarization requires a HuggingFace token (
HF_TOKENenv var) with access to pyannote models, and is activated by settingdiarize=truein the transcription request.Notes for Reviewers
Signed commits