我想创建一种类似于 Siri 或 Alexa 的声控个人助理。说一个关键字,然后将其余音频处理为文本。我有一个工作版本,我可以做到这一点。但是,如果您说出关键字并稍等片刻,它就会超时。我无法说出关键字,等待 1 或 2 秒,然后说出其余的命令。
我希望能够说出关键字并让它在实际超时之前等待 10 或 15 秒。
我试过设置这些属性,但没有任何改变。
SpeechConfig.SetProperty(PropertyId.SpeechServiceConnection_InitialSilenceTimeoutMs, "15000");
SpeechConfig.SetProperty(PropertyId.SpeechServiceConnection_EndSilenceTimeoutMs, "15000");
和
SpeechRecognizer.Properties.SetProperty(PropertyId.SpeechServiceConnection_InitialSilenceTimeoutMs, "15000");
SpeechRecognizer.Properties.SetProperty(PropertyId.SpeechServiceConnection_EndSilenceTimeoutMs, "15000");
我在用
SpeechRecognizer.StartKeywordRecognitionAsync()
做识别。我试图用
阻止它SpeedRecognizer.StopKeywordRecognitionAsync()
然后使用
SpeechRecognizer.StartContinousRecognitionAsync()
在 SessionStarted、SessionStopped、Recognizing 或 Recognized 事件中的任何一个。永远不会调用已取消的事件。
我原以为它会在说出关键字后等待,但没有。有谁知道如何做到这一点?我错过了什么?
我能够通过阅读此处的文档来弄清楚: https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/
基本前提是先创建一个KeywordRecognizer。然后调用识别器函数获取关键字。结果是一个 RecognizedKeyword,您可以从那里创建一个 SpeechRecognizer。调用识别器函数,您将获得命令的其余部分。默认延迟是从获取关键字到超时30秒。
这里有一些代码来举个例子:
using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
namespace SpeechRecognitionDemo
{
class Program
{
static SpeechConfig speechConfig;
static KeywordRecognitionModel keywordModel;
static AudioConfig audioConfig;
static TaskCompletionSource<int> stopRecognition;
static async Task Main(string[] args)
{
// Creates an instance of a speech config with specified subscription key and service region.
// Replace with your own subscription key and service region (e.g., "westus").
speechConfig = SpeechConfig.FromSubscription("subscription key", "region");
speechConfig.SpeechRecognitionLanguage = "en-US";
// set this property to allow more time between words in the command
speechConfig.SetProperty(PropertyId.Speech_SegmentationSilenceTimeoutMs, "2000");
// Creates an instance of a keyword recognition model. Update this to
// point to the location of your keyword recognition model.
keywordModel = KeywordRecognitionModel.FromFile("keywords.table");
audioConfig = AudioConfig.FromDefaultMicrophoneInput();
await RunAssistant();
}
static async Task RunAssistant()
{
bool keepRunning = true;
while (keepRunning)
{
// Starts recognizing.
Console.WriteLine($"Say something starting with the keyword 'Hey Assistant' followed by whatever you want...");
stopRecognition = new TaskCompletionSource<int>(TaskCreationOptions.RunContinuationsAsynchronously);
using (var keywordRecognizer = new KeywordRecognizer(audioConfig))
{
// recognize the keywords
KeywordRecognitionResult result = await keywordRecognizer.RecognizeOnceAsync(keywordModel);
if (result.Reason == ResultReason.RecognizedKeyword)
{
Console.WriteLine($"RECOGNIZED KEYWORD: Text={result.Text}");
using (var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig))
{
// Subscribes to events.
speechRecognizer.Recognizing += (s, e) =>
{
if (e.Result.Reason == ResultReason.RecognizingSpeech)
{
Console.WriteLine($"RECOGNIZING: Text={e.Result.Text}");
}
};
speechRecognizer.Recognized += (s, e) =>
{
if (e.Result.Reason == ResultReason.RecognizedSpeech)
{
Console.WriteLine($"RECOGNIZED: Text={e.Result.Text}");
}
else if (e.Result.Reason == ResultReason.NoMatch)
{
Console.WriteLine("NOMATCH: Speech could not be recognized.");
}
};
speechRecognizer.SessionStarted += (s, e) =>
{
Console.WriteLine("\nSession started event.\n");
};
speechRecognizer.SessionStopped += (s, e) =>
{
Console.WriteLine("\nSession stopped event.");
Console.WriteLine("\nStop recognition.");
stopRecognition.TrySetResult(0);
};
// now recognize the commands
await speechRecognizer.RecognizeOnceAsync();
}
}
if (result.Reason == ResultReason.Canceled)
{
Console.WriteLine($"CANCELLED KEYWORD");
stopRecognition.TrySetResult(0);
}
if (result.Reason == ResultReason.NoMatch)
{
Console.WriteLine($"NO MATCH KEYWORD");
}
// Use Task.WaitAny to keep the task rooted.
Task.WaitAny(new[] { stopRecognition.Task });
Console.WriteLine("\n");
}
}
audioConfig.Dispose();
}
}
}
您需要使用 Speech Studio 创建一个 keywords.table 文件,这是不言自明的。您还需要一个订阅 ID,然后下载一个模型以供离线使用。
这个例子等待一个关键字,然后等待更多的文本。它将结果打印到控制台,然后回来重新做一遍。