使用 Python 确定用于 Azure 认知服务的麦克风源

Question

我已经设置了一个 Azure 认知服务实例来通过麦克风监听关键短语。这工作正常，但是我无法告诉它监听我的 Apple Mac 上的特定麦克风。

我的代码是：

mic_name = self.preferences.get('mic_name', None)
self.audio_config = speechsdk.audio.AudioConfig(device_name=mic_name) if mic_name else None
...
self.keyword_recognizer = speechsdk.KeywordRecognizer(audio_config=self.audio_config)

我将麦克风名称作为设备名称提供给

speechsdk.audio.AudioConfig

库。然而，从我可以通过https://aka.ms/csspeech/microphone-selection读到的内容来看，我似乎需要提供设备ID，而不是pyaudio给我的名称或索引。

我一直在网上搜索，试图找到获取设备 ID 的解决方案，我唯一能够确定的是，可能需要 pyobjc 包才能通过 Objective-C 与硬件交互。然而我的尝试也失败了。

有谁知道现有的库，或者我可以参考的示例，其中 Python 脚本能够返回麦克风设备的 ID，以便我可以将其提供给语音服务 SDK？（我也希望这适用于 Windows，但这是一个单独的问题）

Answer 1

方法1：-

使用下面的代码选择可用的麦克风或音频设备，然后获取语音输出：-

import azure.cognitiveservices.speech as speechsdk
import sounddevice as sd
import soundfile as sf

def text_to_speech(text, output_file):
    # Set up the speech config
    speech_config = speechsdk.SpeechConfig(subscription="xxxxxxx5a10", region="eastus")

    # Create a speech synthesizer object
    speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)

    # Synthesize the text to speech
    result = speech_synthesizer.speak_text_async(text).get()

    # Save the speech output to a file
    if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
        audio_data = result.audio_data
        sf.write(output_file, audio_data, 16000)

def select_microphone():
    print("Available microphones:")
    for i, device in enumerate(sd.query_devices()):
        print(f"{i}: {device['name']}")
    
    device_index = int(input("Select microphone index: "))
    return device_index

def main():
    device_index = select_microphone()
    text = input("Enter the text to convert to speech: ")
    output_file = "output.wav"
    text_to_speech(text, output_file)
    
    play_audio_file(output_file, device_index)

if __name__ == "__main__":
    main()

输出：-

enter image description here

方法2：-

或者，您可以直接在代码中添加麦克风设备 ID 并获取该特定麦克风中的语音输出：- 为了在您的 MAC 设备

中获取

Microphone Device ID，请在终端中使用命令：-

system_profiler SPAudioDataType

这将列出音频设备及其

ID's

，现在在下面的代码中使用 Microphone Id
：-

import os
import azure.cognitiveservices.speech as speechsdk


mic_device_id = "INTELAUDIO\FUNC_xxxxxxxx_xxxxxEV_10xx\5&1xxxx001"


speech_config = speechsdk.SpeechConfig(subscription='de63f99217074bd88429dbc7ccb45a10', region="eastus")


audio_config = speechsdk.audio.AudioConfig(device_name=mic_device_id)


speech_config.speech_synthesis_voice_name='en-US-JennyNeural'

speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)


print("Enter some text that you want to speak >")
text = input()

speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()

if speech_synthesis_result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
    print("Speech synthesized for text [{}]".format(text))
elif speech_synthesis_result.reason == speechsdk.ResultReason.Canceled:
    cancellation_details = speech_synthesis_result.cancellation_details
    print("Speech synthesis canceled: {}".format(cancellation_details.reason))
    if cancellation_details.reason == speechsdk.CancellationReason.Error:
        if cancellation_details.error_details:
            print("Error details: {}".format(cancellation_details.error_details))
            print("set the resource key and region values?")

输出：-

enter image description here

使用 Python 确定用于 Azure 认知服务的麦克风源

问题描述投票：0回答：1

1个回答

最新问题

使用 Python 确定用于 Azure 认知服务的麦克风源

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1