Google 语音转文本（语音识别）仅识别音频的前几秒

Question

我在 Node js 中使用 Google 的 Speech-to-Text API。它返回前几个单词的识别结果，但随后忽略音频文件的其余部分。任何上传文件的截止点约为 5-7 秒。

我尝试过同步语音识别较短的音频文件。（下面显示使用 MP3 文件的示例）

    filename = './TEST/test.mp3';

    const client = new speech.SpeechClient();

    //configure the request:
    const config = {
        enableWordTimeOffsets: true,
        sampleRateHertz: 44100,
        encoding: 'MP3',
        languageCode: 'en-US',
    };
    const audio = {
        content: fs.readFileSync(filename).toString('base64'),
    };
    const request = {
        config: config,
        audio: audio,
    };
    
    // Detects speech in the audio file
    const [response] = await client.recognize(request);

我还尝试过异步识别较长的音频文件（下面显示使用 WAV 文件的示例）

filename = './TEST/test.wav';

const client = new speech.SpeechClient();

//configure the request:
const config = {
     enableWordTimeOffsets: true,
     languageCode: 'en-US',
};
const audio = {
     content: fs.readFileSync(filename).toString('base64'),
};
const request = {
     config: config,
     audio: audio,
};

//Do a longRunningRecognize request
const [operation] = await client.longRunningRecognize(request);
const [response] = await operation.promise();

我已经尝试过使用 WAV 文件和 MP3 实现这些实现。结果总是完全相同：前 5 秒识别良好，然后就什么都识别不了。

任何帮助将不胜感激！

Answer 1

@Ricco D 绝对正确，我打印的结果不正确......

当您尝试转录较长的文件时，Google Speech to Text 会根据何时检测到语音暂停来中断您的转录。

您的response.results[]数组将包含多个条目，您需要循环遍历这些条目才能打印完整的成绩单。

请参阅文档了解更多详细信息： https://cloud.google.com/speech-to-text/docs/basics#responses

Answer 2

只需修改您的代码如下，它就会给出完整的预测

text = r.recognize_google(data,show_all = True)

Google 语音转文本（语音识别）仅识别音频的前几秒

问题描述投票：0回答：2

2个回答

最新问题

Google 语音转文本（语音识别）仅识别音频的前几秒

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2