Skip to content

Conversation

@nanoandrew4
Copy link

@nanoandrew4 nanoandrew4 commented Jan 31, 2026

Description

Closes #1071.

This PR adds support for the response_format request parameter in the transcription endpoint of the API, and to the transcribe CLI command, in accordance with the official OpenAI API (with the addition of the lrc format, which I am particularly interested in :) ). The responses of the transcription endpoint now mirror the behaviour of the official API, with the exception of the behaviour when the parameter is omitted, which in our case will do what it did previously, to not break existing use cases.

The start/end values for each segment have also been adjusted in the whisper and faster-whisper backends, since they were returning values that when converted to the time.Duration field in the main application yielded incorrect values.

In the faster-whisper backend, the compute_type was changed to default from float16 since I was mistakenly running the model on CPU and it was failing due to my CPU not supporting float16 properly. Once the model was configured to use cuda, it worked fine, but this means that faster-whisper currently won't work on some CPUs. default works on all devices, we can add a check for the f16 config if we want to enable float16 support in this backend, either in this PR or in a separate PR.

The tests were updated to use the official OpenAi Go client, since the one used before did not support the response format request param properly.

I also took the liberty of restricting certain CI workflows to only run on the main repo, since our forks will not have the credentials to run them correctly, nor should they. If you'd rather I remove these changes, I'll undo them, it's just for convenience to avoid the email spam for all the pipelines that fail constantly.

Notes for Reviewers
These changes were tested against the whisper and faster-whisper backends. I was unable to test with qwen-asr.
The AIO tests were also run successfully.

Signed commits

  • Yes, I signed my commits.

nanoandrew4 and others added 3 commits January 31, 2026 22:01
(cherry picked from commit e271dd7)
Signed-off-by: Andres Smith <[email protected]>
(cherry picked from commit 6a93a8f)
Signed-off-by: Andres Smith <[email protected]>
@netlify
Copy link

netlify bot commented Jan 31, 2026

Deploy Preview for localai ready!

Name Link
🔨 Latest commit 945ab38
🔍 Latest deploy log https://app.netlify.com/projects/localai/deploys/697f10a64527d40008f181d4
😎 Deploy Preview https://deploy-preview-8318--localai.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@nanoandrew4 nanoandrew4 changed the title Add transcribe response format request parameter & adjust STT backends feat(api): Add transcribe response format request parameter & adjust STT backends Jan 31, 2026
@nanoandrew4 nanoandrew4 changed the title feat(api): Add transcribe response format request parameter & adjust STT backends feat(api)! Add transcribe response format request parameter & adjust STT backends Jan 31, 2026
@nanoandrew4 nanoandrew4 changed the title feat(api)! Add transcribe response format request parameter & adjust STT backends feat(api)!: Add transcribe response format request parameter & adjust STT backends Jan 31, 2026
…lso work on CLI

Signed-off-by: Andres Smith <[email protected]>
(cherry picked from commit 69a9397)
Signed-off-by: Andres Smith <[email protected]>
@nanoandrew4 nanoandrew4 force-pushed the transcribe-response-format branch from 77bf6a3 to f0e5b46 Compare January 31, 2026 22:28
@mudler mudler changed the title feat(api)!: Add transcribe response format request parameter & adjust STT backends feat(api): Add transcribe response format request parameter & adjust STT backends Feb 1, 2026
@mudler
Copy link
Owner

mudler commented Feb 1, 2026

Description

Closes #1071.

This PR adds support for the response_format request parameter in the transcription endpoint of the API, and to the transcribe CLI command, in accordance with the official OpenAI API (with the addition of the lrc format, which I am particularly interested in :) ). The responses of the transcription endpoint now mirror the behaviour of the official API, with the exception of the behaviour when the parameter is omitted, which in our case will do what it did previously, to not break existing use cases.

The start/end values for each segment have also been adjusted in the whisper and faster-whisper backends, since they were returning values that when converted to the time.Duration field in the main application yielded incorrect values.

In the faster-whisper backend, the compute_type was changed to default from float16 since I was mistakenly running the model on CPU and it was failing due to my CPU not supporting float16 properly. Once the model was configured to use cuda, it worked fine, but this means that faster-whisper currently won't work on some CPUs. default works on all devices, we can add a check for the f16 config if we want to enable float16 support in this backend, either in this PR or in a separate PR.

I'm totally fine to do it in a separate PR, for now it's looking good!

The tests were updated to use the official OpenAi Go client, since the one used before did not support the response format request param properly.

That's nice, thank you for switching to the official client (we should probably do the same across the codebase).

I also took the liberty of restricting certain CI workflows to only run on the main repo, since our forks will not have the credentials to run them correctly, nor should they. If you'd rather I remove these changes, I'll undo them, it's just for convenience to avoid the email spam for all the pipelines that fail constantly.

I don't see these changes in the PR - is this intentional?

In any case, it looks good here - thanks!

@mudler mudler enabled auto-merge (squash) February 1, 2026 08:18
@mudler mudler added enhancement New feature or request and removed dependencies labels Feb 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Audio-to-text support for subtitling audio for media

2 participants