[使用Microsoft.CognitiveServices.Speech的网络流实时记录

Question

[我们正在计划一个POC，在该POC中，我们将新闻发布会的多播流提供给SpeechRecognizer，以期获得一个“实时”记录，然后将其用于实时字幕。到目前为止，我看到了两个挑战：

首先，我不知道如何“抓取”多播流并将其提供给SpeechRecognizer。如果有人愿意共享一个代码示例来说明如何做到这一点（最好是在C＃中），那将非常有帮助。

另一件事与计时有关。我已经使用麦克风输入进行了一些初步测试，当语音或多或少连续时，该服务一次处理很大的语音块，导致在我拿回任何东西之前有相当长的延迟，这在理想情况下是不理想的。现场字幕场景。我是否可以使用一些设置来更改“粒度”，以便更频繁地获取较小的块（如果这样）？

任何和所有输入将不胜感激。

Answer 1

抱歉，没有使用多播流的经验。

对于语音识别，您可以在连续识别期间订阅最终结果和中间结果。语音识别引擎识别出语音的“段”后，便会创建最终结果。您会更频繁地收到中间识别事件，这些事件会给您有关语音识别过程的中间结果。这些可能会在识别过程中发生变化，但是随着语音识别过程的进行，您会发现它们变得越来越“稳定”。

狼人

Answer 2

[如上所述，Wolfgang，对于连续语音，您可以订阅Recognizing事件以接收对预测的语音文本的定期更新。当Azure语音服务确定用户已停止讲话时，将触发Recognized事件。

示例：

    var microphone = string.IsNullOrEmpty(file);
    var audio = microphone
        ? AudioConfig.FromDefaultMicrophoneInput()
        : AudioConfig.FromWavFileInput(file);

    var config = SpeechConfig.FromSubscription(key, region);
    var recognizer = new SpeechRecognizer(config);

    recognizer.SessionStarted += SessionStarted;
    recognizer.SessionStopped += SessionStopped;
    recognizer.Recognizing += Recognizing;
    recognizer.Recognized += Recognized;
    recognizer.Canceled += Canceled;

    recognizer.StartContinuousRecognitionAsync().Wait();
    if (microphone) { Console.WriteLine("Listening; press ENTER to stop ...\n"); }

    var timeout = _values.GetOrDefault("recognize.timeout", _microphone ? 30000 : int.MaxValue);
    WaitForContinuousStopCancelKeyOrTimeout(recognizer, timeout);

    recognizer.StopContinuousRecognitionAsync().Wait();

具有此类事件处理程序：

    private void Recognizing(object sender, SpeechRecognitionEventArgs e)
    {
        Console.WriteLine($"RECOGNIZING: {e.Result.Text}");
    }

    private void Recognized(object sender, SpeechRecognitionEventArgs e)
    {
        var result = e.Result;
        if (result.Reason == ResultReason.RecognizedSpeech && result.Text.Length != 0)
        {
            Console.WriteLine($"RECOGNIZED: {result.Text}");
            Console.WriteLine();
        }
        else if (result.Reason == ResultReason.NoMatch && _verbose)
        {
            Console.WriteLine($"NOMATCH: Speech could not be recognized.");
            Console.WriteLine();
        }
    }

[运行时，当我说“我的名字是罗伯·钱伯斯，这是对语音识别的测试”时，输出显示很快（在我说的每个单词700-1000ms内：]]

Listening; press ENTER to stop ... RECOGNIZING: my RECOGNIZING: my name RECOGNIZING: my name is RECOGNIZING: my name RECOGNIZING: my name is RECOGNIZING: my name is rob RECOGNIZING: my name is rob chambers RECOGNIZING: my name is rob chambers and RECOGNIZING: my name is rob chambers and this RECOGNIZING: my name is rob chambers and this RECOGNIZING: my name is rob chambers and this is RECOGNIZING: my name is rob chambers and this is RECOGNIZING: my name is rob chambers and this is a RECOGNIZING: my name is rob chambers and this is a test RECOGNIZING: my name is rob chambers and this is a test of RECOGNIZING: my name is rob chambers and this is a test of speech RECOGNIZING: my name is rob chambers and this is a test of RECOGNIZING: my name is rob chambers and this is a test of speech RECOGNIZING: my name is rob chambers and this is a test of speech recognition RECOGNIZED: My name is Rob Chambers and this is a test of speech recognition.

当我讲几乎相同的词组，但作为两个句子之间有短暂的停顿时，输出显示如下：

    Listening; press ENTER to stop ...

    RECOGNIZING: my
    RECOGNIZING: my name
    RECOGNIZING: my name is
    RECOGNIZING: my name is
    RECOGNIZING: my name is rob
    RECOGNIZING: my name is rob chambers
    RECOGNIZED: My name is Rob Chambers.

    RECOGNIZING: this
    RECOGNIZING: this is a
    RECOGNIZING: this is a test
    RECOGNIZING: this is a test of
    RECOGNIZING: this is a test of speech
    RECOGNIZING: this is a test of speech recognition
    RECOGNIZED: This is a test of speech recognition.

[使用Microsoft.CognitiveServices.Speech的网络流实时记录

问题描述投票：0回答：2

2个回答

最新问题

[使用Microsoft.CognitiveServices.Speech的网络流实时记录

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2