speech-to-text 相关问题

将口语翻译成文本。可能的同义词包括自动语音识别，ASR，计算机语音识别，语音到文本，STT。

我正在使用电话号码语音 webhook，做出如下 TwiML 响应：我正在使用电话号码语音 webhook，做出如下 TwiML 响应： <?xml version="1.0" encoding="UTF-8"?> <Response> <Connect> <Stream url="wss://..."/> </Connect> <Gather speechTimeout="auto" speechModel="phone_call" enhanced="true" input="speech" action="/respond"/> </Response> 它正在正确启动双向语音Stream，没有任何问题。它能够连接、发送数据和断开连接。但它没有在 Gather 部分提出任何“/respond”请求。如果我删除 Stream 连接部分并将 TwiML 更新为： <?xml version="1.0" encoding="UTF-8"?> <Response> <Gather speechTimeout="auto" speechModel="phone_call" enhanced="true" input="speech" action="/respond"/> </Response> 然后 Gather 被呼叫。但为什么不使用双向 Stream 来调用呢？问。我们想要什么？要么：完全通过流来完成吗？我在一个地方获取 StreamId、ConnectionId、CallId 时遇到问题。像我一样使用Gather？这里使用 BiDirectional Stream，由于某种原因甚至没有调用 Gather。问。为什么使用Gather？目前，我们正在使用已经训练好的 speechTimeout 和语音 model 在用户停止说话时获取 que。在 Gather 步骤中，我们向另一个 API 端点发出请求，在“StreamId”、“ConnectionId”和“CallId”的帮助下，我们将语音响应作为流输出发送。您所描述的 Stream 工作但 Gather 不工作的行为是您正在使用的 Twiml 的设计所致。 Twilio 按顺序处理 Twiml，直到“动词”完成后才继续处理。 Twiml 中的动词是“连接”和“聚集”。直播后您可以看到 Gather twiml： <?xml version="1.0" encoding="UTF-8"?> <Response> <Connect> <Stream url="wss://..."/> </Connect> <Gather speechTimeout="auto" speechModel="phone_call" enhanced="true" input="speech" action="/respond"/> </Response> 另一种方法是仅使用 Gather Twiml，然后使用 Twilio REST API 来处理 Media Streams: string accountSid = Environment.GetEnvironmentVariable("TWILIO_ACCOUNT_SID"); string authToken = Environment.GetEnvironmentVariable("TWILIO_AUTH_TOKEN"); TwilioClient.Init(accountSid, authToken); var stream = StreamResource.Create( url: new Uri("wss://example.com/"), pathCallSid: "CAXXXXXXXXXXXXXXXXXXXXXXXXXXX" ); 您的应用程序需要从处于“收集”模式的调用中获取 pathCallSid，然后使用 Twilio REST API 启动该调用的媒体流。这种方法的一个问题是，Gather 似乎最适合通话的一小部分。解决您提出的另一个问题：完全通过流来完成吗？我在获取 StreamId、ConnectionId、CallId 时遇到问题一个地方。创建流时查看状态回调参数： statusCallback 属性采用绝对或相对 URL 作为价值。每当流启动或停止时，Twilio 都会生成一个请求此网址例如： <Stream url="wss://..." statusCallback="http://yourapi.com..." /> 发送到 statusCallback url 的参数包含 StreamSid 和 CallSid。

stream twilio text-to-speech speech-to-text twilio-twiml

回答 1 投票 0

如何确定正在使用哪种语音转文本模型？

我有一个dialogflow ES 机器人，谷歌可以在其中将语音转换为文本，这发生在将dialogflow 请求发送到Dialogflow 实现之前。我们在对话流 ES 中有自动语音适应设置...

dialogflow-es speech-to-text

回答 1 投票 0

尝试使用 devtoolset-8 编译 Azure 语音示例

我正在尝试使用 devtoolset-8 编译 azure linux c++ 示例： scl 启用 devtoolset-8“make” 但我非常缺乏依赖，我也找不到解决方案。错误

c++ azure speech-to-text

回答 1 投票 0

使用 Selenium (PYTHON) 获取文本时出现问题

我正在尝试使用 Selenium 为我的 AI 语音助手构建语音到文本，到目前为止，除了我无法获取输出文本之外，一切都很顺利。这是我正在使用的网站...

python selenium-webdriver speech-to-text

回答 1 投票 0

S2S Translation 在 Windows 上运行，但在 Linux 上不起作用

我正在使用微软语音翻译模型。我在windows下用过。效果很好。当我在 Linux 上运行它时，它给了我多个错误。其中大部分是由于我插入的依赖项......

python-3.x azure ubuntu azure-cognitive-services speech-to-text

回答 1 投票 0

Azure 语音转文本和 TTS 正在自言自语

希望这是一个“哦，这比我想象的要简单”..但我似乎无法使用 Azure C# 进行双工文本转语音和语音转文本，而不会听到“说话”。 .

c# azure speech-recognition text-to-speech speech-to-text

回答 1 投票 0

初始化语音到文本时，iOS 上耳机发出颤动音频

在我的 ios 应用程序中，我使用 flutter tts 进行音频输出，使用语音到文本进行输入。仅当 STT 初始化时，音量才会非常低，因为音频来自听筒而不是...

ios flutter speech-recognition text-to-speech speech-to-text

回答 1 投票 0

语音识别器无法工作：PCM WAV、AIFF/AIFF-C 或 Native FLAC 错误

我有以下音频文件上面写着“下”字我正在尝试将音频转换为文本：导入语音识别 r = sr.Recognizer() 以 sr.AudioFile("example.wav") 作为源： ...

speech-recognition wav speech-to-text

回答 1 投票 0

speech_recognition.listen() 不起作用

每当我尝试运行语音识别的监听功能时，它都会卡在那里，我尝试调试它，但它仍然卡在那里，而在那里，cmd 行也不会打印任何内容...

python speech-recognition speech-to-text

回答 1 投票 0

Python 语音识别卡住了（Mac）

这是我的代码：将语音识别导入为 sr r = sr.Recognizer() 以 sr.Microphone() 作为源：打印（“继续！”）音频 = r.listen(来源) 尝试：文本 = r。

python speech-recognition speech-to-text

回答 3 投票 0

如何在整个目录上运行 Whisper？

我想使用 Whisper 将语音转录为文本。我已经能够使用以下命令在单个文件上成功运行它：耳语音频.wav 我想在

audio speech-to-text openai-whisper

回答 2 投票 0

Android语音识别：如何解决口语单词与存储单词匹配时的连字符、复合词、撇号等问题？

我正在尝试在android中实现语音识别。这里我有一个段落设置为 TextView。我有由“”空格分隔的单词数组列表。我正在使用 onPartialResult 来获取...

java android kotlin speech-recognition speech-to-text

回答 1 投票 0

为什么字信息丢失（WIL）是这样计算的？

单词信息丢失 (WIL) 是衡量自动语音识别 (ASR) 服务（例如 AWS Transcribe、Google Speech-to-Text 等）相对于黄金标准（通常是人类 -

speech-recognition speech-to-text speech automatic-speech-recognition

回答 1 投票 0

google Cloud Speech-to-Text V1 和 v2 有什么区别

我的代码项目框架是springboot+maven，版本是 com.google.cloud 谷歌云语音我的代码项目框架是springboot+maven，版本是 <dependency> <groupId>com.google.cloud</groupId> <artifactId>google-cloud-speech</artifactId> <version>1.24.3</version> </dependency> 当我使用2.3.0时，代码给了我一个错误。 <dependency> <groupId>com.google.cloud</groupId> <artifactId>google-cloud-speech</artifactId>  <version>2.3.0</version> </dependency> 这是错误消息： Exception in thread "pool-4-thread-1" Exception in thread "pool-3-thread-1" java.lang.NoSuchMethodError: io.grpc.internal.AbstractManagedChannelImplBuilder: method <init>()V not found at io.grpc.netty.shaded.io.grpc.netty.NettyChannelBuilder.<init>(NettyChannelBuilder.java:200) at io.grpc.netty.shaded.io.grpc.netty.NettyChannelBuilder.forTarget(NettyChannelBuilder.java:169) at io.grpc.netty.shaded.io.grpc.netty.NettyChannelBuilder.forAddress(NettyChannelBuilder.java:152) at io.grpc.netty.shaded.io.grpc.netty.NettyChannelProvider.builderForAddress(NettyChannelProvider.java:38) at io.grpc.netty.shaded.io.grpc.netty.NettyChannelProvider.builderForAddress(NettyChannelProvider.java:24) at io.grpc.ManagedChannelBuilder.forAddress(ManagedChannelBuilder.java:39) at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createSingleChannel(InstantiatingGrpcChannelProvider.java:350) at com.google.api.gax.grpc.ChannelPool.<init>(ChannelPool.java:105) at com.google.api.gax.grpc.ChannelPool.create(ChannelPool.java:83) at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createChannel(InstantiatingGrpcChannelProvider.java:236) at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.getTransportChannel(InstantiatingGrpcChannelProvider.java:230) at com.google.api.gax.rpc.ClientContext.create(ClientContext.java:201) at com.google.cloud.speech.v1p1beta1.stub.GrpcSpeechStub.create(GrpcSpeechStub.java:95) at com.google.cloud.speech.v1p1beta1.stub.SpeechStubSettings.createStub(SpeechStubSettings.java:133) at com.google.cloud.speech.v1p1beta1.SpeechClient.<init>(SpeechClient.java:134) at com.google.cloud.speech.v1p1beta1.SpeechClient.create(SpeechClient.java:116) at com.google.cloud.speech.v1p1beta1.SpeechClient.create(SpeechClient.java:108) at com.duplicall.ibaeonline.EngineClients.google.InfiniteStreamRecognize.infiniteStreamingRecognize(InfiniteStreamRecognize.java:95) at com.duplicall.ibaeonline.EngineClients.google.InfiniteStreamRecognize.lambda$StartTrans$0(InfiniteStreamRecognize.java:291) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) java.lang.NoSuchMethodError: io.grpc.internal.AbstractManagedChannelImplBuilder: method <init>()V not found at io.grpc.netty.shaded.io.grpc.netty.NettyChannelBuilder.<init>(NettyChannelBuilder.java:200) at io.grpc.netty.shaded.io.grpc.netty.NettyChannelBuilder.forTarget(NettyChannelBuilder.java:169) at io.grpc.netty.shaded.io.grpc.netty.NettyChannelBuilder.forAddress(NettyChannelBuilder.java:152) at io.grpc.netty.shaded.io.grpc.netty.NettyChannelProvider.builderForAddress(NettyChannelProvider.java:38) at io.grpc.netty.shaded.io.grpc.netty.NettyChannelProvider.builderForAddress(NettyChannelProvider.java:24) at io.grpc.ManagedChannelBuilder.forAddress(ManagedChannelBuilder.java:39) at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createSingleChannel(InstantiatingGrpcChannelProvider.java:350) at com.google.api.gax.grpc.ChannelPool.<init>(ChannelPool.java:105) at com.google.api.gax.grpc.ChannelPool.create(ChannelPool.java:83) at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createChannel(InstantiatingGrpcChannelProvider.java:236) at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.getTransportChannel(InstantiatingGrpcChannelProvider.java:230) at com.google.api.gax.rpc.ClientContext.create(ClientContext.java:201) at com.google.cloud.speech.v1p1beta1.stub.GrpcSpeechStub.create(GrpcSpeechStub.java:95) at com.google.cloud.speech.v1p1beta1.stub.SpeechStubSettings.createStub(SpeechStubSettings.java:133) at com.google.cloud.speech.v1p1beta1.SpeechClient.<init>(SpeechClient.java:134) at com.google.cloud.speech.v1p1beta1.SpeechClient.create(SpeechClient.java:116) at com.google.cloud.speech.v1p1beta1.SpeechClient.create(SpeechClient.java:108) at com.duplicall.ibaeonline.EngineClients.google.InfiniteStreamRecognize.infiniteStreamingRecognize(InfiniteStreamRecognize.java:95) at com.duplicall.ibaeonline.EngineClients.google.InfiniteStreamRecognize.lambda$StartTrans$0(InfiniteStreamRecognize.java:291) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 我不知道V1和V2之间的区别。我认为google-cloud-speech版本升级应该是向后兼容的。如果您熟悉该产品请评论查看官方代码； V1： https://github.com/googleapis/google-cloud-java/blob/main/java-speech/samples/snippets/ generated/com/google/cloud/speech/v1/speech/streamingrecognize/AsyncStreamingRecognize.java public static void asyncStreamingRecognize() throws Exception { // This snippet has been automatically generated and should be regarded as a code template only. // It will require modifications to work: // - It may require correct/in-range values for request initialization. // - It may require specifying regional endpoints when creating the service client as shown in // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library try (SpeechClient speechClient = SpeechClient.create()) { BidiStream<StreamingRecognizeRequest, StreamingRecognizeResponse> bidiStream = speechClient.streamingRecognizeCallable().call(); StreamingRecognizeRequest request = StreamingRecognizeRequest.newBuilder().build(); bidiStream.send(request); for (StreamingRecognizeResponse response : bidiStream) { // Do something when a response is received. } } } V2： https://github.com/googleapis/google-cloud-java/blob/main/java-speech/samples/snippets/ generated/com/google/cloud/speech/v2/speech/streamingrecognize/AsyncStreamingRecognize.java public static void asyncStreamingRecognize() throws Exception { // This snippet has been automatically generated and should be regarded as a code template only. // It will require modifications to work: // - It may require correct/in-range values for request initialization. // - It may require specifying regional endpoints when creating the service client as shown in // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library try (SpeechClient speechClient = SpeechClient.create()) { BidiStream<StreamingRecognizeRequest, StreamingRecognizeResponse> bidiStream = speechClient.streamingRecognizeCallable().call(); StreamingRecognizeRequest request = StreamingRecognizeRequest.newBuilder() .setRecognizer( RecognizerName.of("[PROJECT]", "[LOCATION]", "[RECOGNIZER]").toString()) .build(); bidiStream.send(request); for (StreamingRecognizeResponse response : bidiStream) { // Do something when a response is received. } } } 这和maven引用的版本有关系吗？

speech-to-text google-cloud-speech google-speech-to-text-api

回答 1 投票 0

语音中的 Dialogflow 电子邮件地址

有人对通过语音获取用户的电子邮件地址有什么建议吗？书面形式非常简单，因为电子邮件地址在某种程度上遵循某种模式，但使用语音则相当

google-cloud-platform dialogflow-es speech-to-text

回答 2 投票 0

使用 Python .long_running_recognize() 方法对 Google Speech-to-Text API 说话人进行二值化

我正在关注这个问题的答案。但我的音频超过 1 分钟，所以我必须使用 .long_running_recognize(config,audio) 方法而不是 .recognize(config,audio)。这是代码：来自

python google-cloud-platform audio speech-to-text diarization

回答 1 投票 0

Whisper AI - 语音转文本模型

请问我该如何修复这个错误，我已经安装了 ffmpeg NumbaDeprecationWarning：“nopython”关键字参数未提供给“numba.jit”装饰器。隐式默认值