feat(api): Add transcribe response format request parameter & adjust STT backends #8318

nanoandrew4 · 2026-01-31T21:24:21Z

Description

Closes #1071.

This PR adds support for the response_format request parameter in the transcription endpoint of the API, and to the transcribe CLI command, in accordance with the official OpenAI API (with the addition of the lrc format, which I am particularly interested in :) ). The responses of the transcription endpoint now mirror the behaviour of the official API, with the exception of the behaviour when the parameter is omitted, which in our case will do what it did previously, to not break existing use cases.

The start/end values for each segment have also been adjusted in the whisper and faster-whisper backends, since they were returning values that when converted to the time.Duration field in the main application yielded incorrect values.

In the faster-whisper backend, the compute_type was changed to default from float16 since I was mistakenly running the model on CPU and it was failing due to my CPU not supporting float16 properly. Once the model was configured to use cuda, it worked fine, but this means that faster-whisper currently won't work on some CPUs. default works on all devices, we can add a check for the f16 config if we want to enable float16 support in this backend, either in this PR or in a separate PR.

The tests were updated to use the official OpenAi Go client, since the one used before did not support the response format request param properly.

I also took the liberty of restricting certain CI workflows to only run on the main repo, since our forks will not have the credentials to run them correctly, nor should they. If you'd rather I remove these changes, I'll undo them, it's just for convenience to avoid the email spam for all the pipelines that fail constantly.

Notes for Reviewers
These changes were tested against the whisper and faster-whisper backends. I was unable to test with qwen-asr.
The AIO tests were also run successfully.

Signed commits

Yes, I signed my commits.

(cherry picked from commit e271dd7) Signed-off-by: Andres Smith <[email protected]>

(cherry picked from commit 6a93a8f) Signed-off-by: Andres Smith <[email protected]>

(cherry picked from commit f25d1a0) Signed-off-by: Andres Smith <[email protected]>

netlify · 2026-01-31T21:24:27Z

✅ Deploy Preview for localai ready!

Name	Link
🔨 Latest commit	`945ab38`
🔍 Latest deploy log	https://app.netlify.com/projects/localai/deploys/697f10a64527d40008f181d4
😎 Deploy Preview	https://deploy-preview-8318--localai.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

…lso work on CLI Signed-off-by: Andres Smith <[email protected]> (cherry picked from commit 69a9397) Signed-off-by: Andres Smith <[email protected]>

mudler · 2026-02-01T08:18:06Z

Description

Closes #1071.

This PR adds support for the response_format request parameter in the transcription endpoint of the API, and to the transcribe CLI command, in accordance with the official OpenAI API (with the addition of the lrc format, which I am particularly interested in :) ). The responses of the transcription endpoint now mirror the behaviour of the official API, with the exception of the behaviour when the parameter is omitted, which in our case will do what it did previously, to not break existing use cases.

The start/end values for each segment have also been adjusted in the whisper and faster-whisper backends, since they were returning values that when converted to the time.Duration field in the main application yielded incorrect values.

In the faster-whisper backend, the compute_type was changed to default from float16 since I was mistakenly running the model on CPU and it was failing due to my CPU not supporting float16 properly. Once the model was configured to use cuda, it worked fine, but this means that faster-whisper currently won't work on some CPUs. default works on all devices, we can add a check for the f16 config if we want to enable float16 support in this backend, either in this PR or in a separate PR.

I'm totally fine to do it in a separate PR, for now it's looking good!

The tests were updated to use the official OpenAi Go client, since the one used before did not support the response format request param properly.

That's nice, thank you for switching to the official client (we should probably do the same across the codebase).

I also took the liberty of restricting certain CI workflows to only run on the main repo, since our forks will not have the credentials to run them correctly, nor should they. If you'd rather I remove these changes, I'll undo them, it's just for convenience to avoid the email spam for all the pipelines that fail constantly.

I don't see these changes in the PR - is this intentional?

In any case, it looks good here - thanks!

nanoandrew4 and others added 3 commits January 31, 2026 22:01

WIP response format implementation for audio transcriptions

2809ad0

(cherry picked from commit e271dd7) Signed-off-by: Andres Smith <[email protected]>

Rework transcript response_format and add more formats

f7566fa

(cherry picked from commit 6a93a8f) Signed-off-by: Andres Smith <[email protected]>

Add test and replace go-openai package with official openai go client

a783e56

(cherry picked from commit f25d1a0) Signed-off-by: Andres Smith <[email protected]>

github-actions bot added the dependencies label Jan 31, 2026

nanoandrew4 changed the title ~~Add transcribe response format request parameter & adjust STT backends~~ feat(api): Add transcribe response format request parameter & adjust STT backends Jan 31, 2026

nanoandrew4 changed the title ~~feat(api): Add transcribe response format request parameter & adjust STT backends~~ feat(api)! Add transcribe response format request parameter & adjust STT backends Jan 31, 2026

nanoandrew4 changed the title ~~feat(api)! Add transcribe response format request parameter & adjust STT backends~~ feat(api)!: Add transcribe response format request parameter & adjust STT backends Jan 31, 2026

Fix faster-whisper backend and refactor transcription formatting to a…

f0e5b46

…lso work on CLI Signed-off-by: Andres Smith <[email protected]> (cherry picked from commit 69a9397) Signed-off-by: Andres Smith <[email protected]>

nanoandrew4 force-pushed the transcribe-response-format branch from 77bf6a3 to f0e5b46 Compare January 31, 2026 22:28

mudler changed the title ~~feat(api)!: Add transcribe response format request parameter & adjust STT backends~~ feat(api): Add transcribe response format request parameter & adjust STT backends Feb 1, 2026

mudler approved these changes Feb 1, 2026

View reviewed changes

mudler enabled auto-merge (squash) February 1, 2026 08:18

Merge branch 'master' into transcribe-response-format

945ab38

mudler added enhancement New feature or request and removed dependencies labels Feb 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(api): Add transcribe response format request parameter & adjust STT backends #8318

feat(api): Add transcribe response format request parameter & adjust STT backends #8318

nanoandrew4 commented Jan 31, 2026 •

edited

Loading

Uh oh!

netlify bot commented Jan 31, 2026 •

edited

Loading

Uh oh!

mudler commented Feb 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

feat(api): Add transcribe response format request parameter & adjust STT backends #8318

Are you sure you want to change the base?

feat(api): Add transcribe response format request parameter & adjust STT backends #8318

Conversation

nanoandrew4 commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for localai ready!

Uh oh!

mudler commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nanoandrew4 commented Jan 31, 2026 •

edited

Loading

netlify bot commented Jan 31, 2026 •

edited

Loading

mudler commented Feb 1, 2026 •

edited

Loading