feat(whisperx): add whisperx backend for transcription with speaker diarization #8299

eureka928 · 2026-01-30T12:54:35Z

Description

This PR adds a new WhisperX Python backend that provides transcription with speaker diarization (identifying who is speaking), word-level timestamps, and forced alignment via pyannote-audio.

Closes #3375

Key changes:

Extends the gRPC TranscriptSegment message with a speaker field (backward-compatible — existing backends leave it empty)
Maps the new Speaker field through the Go schema (core/schema/transcription.go) and backend mapper (core/backend/transcript.go)
Adds the full backend/python/whisperx/ backend with gRPC server, requirements for CPU/CUDA 12/CUDA 13/ROCm, and unit tests
Registers the backend in the Makefile, backend/index.yaml, and CI workflow

Speaker diarization requires a HuggingFace token (HF_TOKEN env var) with access to pyannote models, and is activated by setting diarize=true in the transcription request.

Notes for Reviewers

The alignment model is cached per language to avoid reloading on every transcription call
The diarization pipeline is lazily initialized and reused across calls
Timestamp handling matches the existing faster-whisper convention

Signed commits

Yes, I signed my commits.

netlify · 2026-01-30T12:54:40Z

✅ Deploy Preview for localai ready!

Name	Link
🔨 Latest commit	`3f93840`
🔍 Latest deploy log	https://app.netlify.com/projects/localai/deploys/697ebd6931f076000884512d
😎 Deploy Preview	https://deploy-preview-8299--localai.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

eureka928 · 2026-01-30T13:01:49Z

@mudler @neurocis nice to meet you and glad to put the first PR

Would you review my PR?

Thank you for your time

backend/python/whisperx/requirements-cpu.txt

backend/python/whisperx/requirements-cublas12.txt

backend/python/whisperx/requirements-cublas13.txt

eureka928 · 2026-01-30T20:39:42Z

Hi @mudler I have updated the code based on your feedback.
Please let me know if you have any further feedback after your review.

eureka928 · 2026-01-31T08:18:19Z

Hi @mudler hope you're having good weekend
Would you give me more feedback after review?
Thank you and have a nice weekend

mudler

Looking good, I will test as well on my setup once on master. Thanks!

mudler · 2026-01-31T20:52:01Z

CI seems to fail ( error looks genuine )

Add speaker field to the gRPC TranscriptSegment message and map it through the Go schema, enabling backends to return speaker labels. Signed-off-by: eureka928 <[email protected]>

Add Python gRPC backend using WhisperX for speech-to-text with word-level timestamps, forced alignment, and speaker diarization via pyannote-audio when HF_TOKEN is provided. Signed-off-by: eureka928 <[email protected]>

Signed-off-by: eureka928 <[email protected]>

…ments Address review feedback: - Use --extra-index-url for CPU torch wheels to reduce size - Remove torch version pins, let uv resolve compatible versions Signed-off-by: eureka928 <[email protected]>

Signed-off-by: eureka928 <[email protected]>

eureka928 · 2026-02-01T02:42:02Z

CI seems to fail ( error looks genuine )

Can you run the CI again?
Thank you

eureka928 · 2026-02-02T03:07:02Z

Hi @mudler this fail isn't from my code update, it's pre-existing issue unrelated to this PR.

eureka928 force-pushed the feat/whisperx-backend branch from 4dcf358 to 7bf3852 Compare January 30, 2026 12:57

github-actions bot added the dependencies label Jan 30, 2026

mudler reviewed Jan 30, 2026

View reviewed changes

backend/python/whisperx/requirements-cpu.txt Outdated Show resolved Hide resolved

mudler reviewed Jan 30, 2026

View reviewed changes

backend/python/whisperx/requirements-cublas12.txt Outdated Show resolved Hide resolved

mudler reviewed Jan 30, 2026

View reviewed changes

backend/python/whisperx/requirements-cublas13.txt Outdated Show resolved Hide resolved

eureka928 force-pushed the feat/whisperx-backend branch from 3e5133d to 0d10ffb Compare January 30, 2026 21:08

mudler previously approved these changes Jan 31, 2026

View reviewed changes

mudler enabled auto-merge (squash) January 31, 2026 15:33

auto-merge was automatically disabled February 1, 2026 02:39
Head branch was pushed to by a user without write access

eureka928 dismissed mudler’s stale review via dad9ac3 February 1, 2026 02:39

eureka928 force-pushed the feat/whisperx-backend branch from dad9ac3 to 17d5edb Compare February 1, 2026 02:40

eureka928 added 7 commits February 1, 2026 03:41

feat(proto): add speaker field to TranscriptSegment for diarization

c3e13b4

Add speaker field to the gRPC TranscriptSegment message and map it through the Go schema, enabling backends to return speaker labels. Signed-off-by: eureka928 <[email protected]>

feat(whisperx): register whisperx backend in Makefile

ad11fdb

Signed-off-by: eureka928 <[email protected]>

feat(whisperx): add whisperx meta and image entries to index.yaml

511b8d9

Signed-off-by: eureka928 <[email protected]>

ci(whisperx): add build matrix entries for CPU, CUDA 12/13, and ROCm

06197b5

Signed-off-by: eureka928 <[email protected]>

fix(whisperx): unpin torch versions and use CPU index for cpu require…

819cab4

…ments Address review feedback: - Use --extra-index-url for CPU torch wheels to reduce size - Remove torch version pins, let uv resolve compatible versions Signed-off-by: eureka928 <[email protected]>

fix(whisperx): pin torch ROCm variant to fix CI build failure

3f93840

Signed-off-by: eureka928 <[email protected]>

eureka928 force-pushed the feat/whisperx-backend branch from 17d5edb to 3f93840 Compare February 1, 2026 02:41

mudler added enhancement New feature or request and removed dependencies labels Feb 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(whisperx): add whisperx backend for transcription with speaker diarization #8299

feat(whisperx): add whisperx backend for transcription with speaker diarization #8299

eureka928 commented Jan 30, 2026 •

edited

Loading

Uh oh!

netlify bot commented Jan 30, 2026 •

edited

Loading

Uh oh!

eureka928 commented Jan 30, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eureka928 commented Jan 30, 2026 •

edited

Loading

Uh oh!

eureka928 commented Jan 31, 2026

Uh oh!

mudler left a comment

Uh oh!

mudler commented Jan 31, 2026

Uh oh!

eureka928 commented Feb 1, 2026

Uh oh!

eureka928 commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

feat(whisperx): add whisperx backend for transcription with speaker diarization #8299

Are you sure you want to change the base?

feat(whisperx): add whisperx backend for transcription with speaker diarization #8299

Conversation

eureka928 commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for localai ready!

Uh oh!

eureka928 commented Jan 30, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eureka928 commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eureka928 commented Jan 31, 2026

Uh oh!

mudler left a comment

Choose a reason for hiding this comment

Uh oh!

mudler commented Jan 31, 2026

Uh oh!

eureka928 commented Feb 1, 2026

Uh oh!

eureka928 commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eureka928 commented Jan 30, 2026 •

edited

Loading

netlify bot commented Jan 30, 2026 •

edited

Loading

eureka928 commented Jan 30, 2026 •

edited

Loading