Medium.en model just outputting "Okay" for every second in the audio while the base.en model works well #719

bangpradyumna · 2023-04-05T11:20:10Z

Hello Everyone,
I have a recording that I'm trying to transcribe. I first tried doing that using base model which worked fine but not perfect. I then tried doing the same using the Medium.en model but it just outputs "Okay" for each second of the audio.

Although there are 5 or 6 "Okays" in the audio but Medium model just keeps on outputting "Okay" even for lines which the "Base" model is able to transcribe.

Screenshot of Base.en model's output which works well :

Screenshot of Medium.en model's output :

Any idea on what I might be doing wrong ?

carlosbaraza · 2023-04-05T14:21:38Z

I having the same problem:

[00:37:36.000 --> 00:38:02.000]   >> Thank you.
[00:38:02.000 --> 00:38:28.000]   >> Thank you.
[00:38:28.000 --> 00:38:54.000]   >> Thank you.
[00:38:54.000 --> 00:39:20.000]   >> Thank you.
[00:39:20.000 --> 00:39:46.000]   >> Thank you.
[00:39:46.000 --> 00:40:12.000]   >> Thank you.
[00:40:12.000 --> 00:40:38.000]   >> Thank you.
[00:40:38.000 --> 00:41:04.000]   >> Thank you.
[00:41:04.000 --> 00:41:30.000]   >> Thank you.
[00:41:30.000 --> 00:41:56.000]   >> Thank you.
[00:41:56.000 --> 00:42:22.000]   >> Thank you.
[00:42:22.000 --> 00:42:48.000]   >> Thank you.
[00:42:48.000 --> 00:43:14.000]   >> Thank you.
[00:43:14.000 --> 00:43:40.000]   >> Thank you.
[00:43:40.000 --> 00:44:06.000]   >> Thank you.
[00:44:06.000 --> 00:44:32.000]   >> Thank you.
[00:44:32.000 --> 00:44:58.000]   >> Thank you.
[00:44:58.000 --> 00:45:24.000]   >> Thank you.
[00:45:24.000 --> 00:45:50.000]   >> Thank you.
[00:45:50.000 --> 00:46:16.000]   >> Thank you.
[00:46:16.000 --> 00:46:42.000]   >> Thank you.
[00:46:42.000 --> 00:47:08.000]   >> Thank you.
[00:47:08.000 --> 00:47:34.000]   >> Thank you.
[00:47:34.000 --> 00:48:00.000]   >> Thank you.
[00:48:00.000 --> 00:48:26.000]   >> Thank you.
[00:48:26.000 --> 00:48:52.000]   >> Thank you.
[00:48:52.000 --> 00:49:12.000]   >> Thank you.
[00:49:12.000 --> 00:49:38.000]   >> Thank you.
[00:49:38.000 --> 00:50:04.000]   >> Thank you.
[00:50:04.000 --> 00:50:30.000]   >> Thank you.
[00:50:30.000 --> 00:50:56.000]   >> Thank you.
[00:50:56.000 --> 00:51:22.000]   >> Thank you.
[00:51:22.000 --> 00:51:42.000]   >> Thank you.
[00:51:42.000 --> 00:52:08.000]   >> Thank you.
[00:52:08.000 --> 00:52:34.000]   >> Thank you.
[00:52:34.000 --> 00:53:00.000]   >> Thank you.
[00:53:00.000 --> 00:53:26.000]   >> Thank you.
[00:53:26.000 --> 00:53:52.000]   >> Thank you.
[00:53:52.000 --> 00:54:12.000]   >> Thank you.
[00:54:12.000 --> 00:54:38.000]   >> Thank you.
[00:54:38.000 --> 00:55:04.000]   >> Thank you.
[00:55:04.000 --> 00:55:30.000]   >> Thank you.
[00:55:30.000 --> 00:55:56.000]   >> Thank you.
[00:55:56.000 --> 00:56:22.000]   >> Thank you.
[00:56:22.000 --> 00:56:48.000]   >> Thank you.
[00:56:48.000 --> 00:57:14.000]   >> Thank you.
[00:57:14.000 --> 00:57:40.000]   >> Thank you.
[00:57:40.000 --> 00:58:06.000]   >> Thank you.
[00:58:06.000 --> 00:58:32.000]   >> Thank you.
[00:58:32.000 --> 00:58:58.000]   >> Thank you.
[00:58:58.000 --> 00:59:24.000]   >> Thank you.
[00:59:24.000 --> 00:59:50.000]   >> Thank you.
[00:59:50.000 --> 01:00:16.000]   >> Thank you.
[01:00:16.000 --> 01:00:42.000]   >> Thank you.
[01:00:42.000 --> 01:01:08.000]   >> Thank you.
[01:01:08.000 --> 01:01:34.000]   >> Thank you.
[01:01:34.000 --> 01:02:00.000]   >> Thank you.
[01:02:00.000 --> 01:02:26.000]   >> Thank you.
[01:02:26.000 --> 01:02:52.000]   >> Thank you.
[01:02:52.000 --> 01:03:14.000]   >> Thank you.
[01:03:14.000 --> 01:03:40.000]   >> Thank you.
[01:03:40.000 --> 01:04:06.000]   >> Thank you.
[01:04:06.000 --> 01:04:32.000]   >> Thank you.
[01:04:32.000 --> 01:04:52.000]   >> Thank you.

abelbabel · 2023-04-11T12:20:49Z

I have the same issue ... seems not to be related to a specific model ... and not with each input file ...

abelbabel · 2023-04-11T12:21:42Z

similar to #731 and #612

ggerganov · 2023-04-14T17:12:42Z

I've disabled the decoder fallbacks because current implementation is very inefficient.
This will be resolved some time in the future

abelbabel · 2023-04-14T19:11:33Z

Turned out that in one case the section where multiple "Okay"s were "hallucinated" was loud rumbling / noises (no speech). I isolated this part and it was detected correctly. After that I took one detected noise output (like "(pages rustling)") as an input for the prompt-parameter and the original file was detected properly.

This is of course not working in large scale.
But maybe it gives an idea where the problem is ...

I disabled this because there were many complaints about slow decoding. The current implementation does not allow batching the decoders when using the "best of" or "beam size" parameters, so the decoding time is proportional to the number of decoders, which is obviously not great. However, now there are even more complaints about wrong decodings and repetition. So, making a compromise by re-enabling the fallbacks, but defaulting to just 2 "best of" / "beam size" decoders. Also, the temperature step is increased from 0.2 to 0.4 - i.e. from maximum of 5 fallbacks to maximum of 2. Also, the stream example now has fallbacks enabled by default. close #471 #477 #508 #612 #719 #731

ggerganov · 2023-04-15T13:28:50Z

Should be resolved via f19e23f

I disabled this because there were many complaints about slow decoding. The current implementation does not allow batching the decoders when using the "best of" or "beam size" parameters, so the decoding time is proportional to the number of decoders, which is obviously not great. However, now there are even more complaints about wrong decodings and repetition. So, making a compromise by re-enabling the fallbacks, but defaulting to just 2 "best of" / "beam size" decoders. Also, the temperature step is increased from 0.2 to 0.4 - i.e. from maximum of 5 fallbacks to maximum of 2. Also, the stream example now has fallbacks enabled by default. close ggerganov#471 ggerganov#477 ggerganov#508 ggerganov#612 ggerganov#719 ggerganov#731

abelbabel mentioned this issue Apr 11, 2023

Many repeated words with stream example. #731

Closed

ggerganov added the decoding Decoding related issues label Apr 14, 2023

ggerganov closed this as completed Apr 15, 2023

pdw207 mentioned this issue May 25, 2023

Duplicate words generated #896

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Medium.en model just outputting "Okay" for every second in the audio while the base.en model works well #719

Medium.en model just outputting "Okay" for every second in the audio while the base.en model works well #719

bangpradyumna commented Apr 5, 2023

carlosbaraza commented Apr 5, 2023

abelbabel commented Apr 11, 2023

abelbabel commented Apr 11, 2023 •

edited

Loading

ggerganov commented Apr 14, 2023

abelbabel commented Apr 14, 2023

ggerganov commented Apr 15, 2023

Medium.en model just outputting "Okay" for every second in the audio while the base.en model works well #719

Medium.en model just outputting "Okay" for every second in the audio while the base.en model works well #719

Comments

bangpradyumna commented Apr 5, 2023

carlosbaraza commented Apr 5, 2023

abelbabel commented Apr 11, 2023

abelbabel commented Apr 11, 2023 • edited Loading

ggerganov commented Apr 14, 2023

abelbabel commented Apr 14, 2023

ggerganov commented Apr 15, 2023

abelbabel commented Apr 11, 2023 •

edited

Loading