我已经设置了一个 Azure 认知服务实例来通过麦克风监听关键短语。这工作正常,但是我无法告诉它监听我的 Apple Mac 上的特定麦克风。
我的代码是:
mic_name = self.preferences.get('mic_name', None)
self.audio_config = speechsdk.audio.AudioConfig(device_name=mic_name) if mic_name else None
...
self.keyword_recognizer = speechsdk.KeywordRecognizer(audio_config=self.audio_config)
我将麦克风名称作为设备名称提供给
speechsdk.audio.AudioConfig
库。然而,从我可以通过https://aka.ms/csspeech/microphone-selection读到的内容来看,我似乎需要提供设备ID,而不是pyaudio给我的名称或索引。
我一直在网上搜索,试图找到获取设备 ID 的解决方案,我唯一能够确定的是,可能需要 pyobjc 包才能通过 Objective-C 与硬件交互。然而我的尝试也失败了。
有谁知道现有的库,或者我可以参考的示例,其中 Python 脚本能够返回麦克风设备的 ID,以便我可以将其提供给语音服务 SDK? (我也希望这适用于 Windows,但这是一个单独的问题)
方法1:-
使用下面的代码选择可用的麦克风或音频设备,然后获取语音输出:-
import azure.cognitiveservices.speech as speechsdk
import sounddevice as sd
import soundfile as sf
def text_to_speech(text, output_file):
# Set up the speech config
speech_config = speechsdk.SpeechConfig(subscription="xxxxxxx5a10", region="eastus")
# Create a speech synthesizer object
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)
# Synthesize the text to speech
result = speech_synthesizer.speak_text_async(text).get()
# Save the speech output to a file
if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
audio_data = result.audio_data
sf.write(output_file, audio_data, 16000)
def select_microphone():
print("Available microphones:")
for i, device in enumerate(sd.query_devices()):
print(f"{i}: {device['name']}")
device_index = int(input("Select microphone index: "))
return device_index
def main():
device_index = select_microphone()
text = input("Enter the text to convert to speech: ")
output_file = "output.wav"
text_to_speech(text, output_file)
play_audio_file(output_file, device_index)
if __name__ == "__main__":
main()
输出:-
方法2:-
或者,您可以直接在代码中添加麦克风设备 ID 并获取该特定麦克风中的语音输出:- 为了在您的 MAC 设备
中获取
Microphone Device ID
,请在终端中使用命令:-
system_profiler SPAudioDataType
这将列出音频设备及其
ID's
,现在在下面的代码中使用 Microphone Id
:-
import os
import azure.cognitiveservices.speech as speechsdk
mic_device_id = "INTELAUDIO\FUNC_xxxxxxxx_xxxxxEV_10xx\5&1xxxx001"
speech_config = speechsdk.SpeechConfig(subscription='de63f99217074bd88429dbc7ccb45a10', region="eastus")
audio_config = speechsdk.audio.AudioConfig(device_name=mic_device_id)
speech_config.speech_synthesis_voice_name='en-US-JennyNeural'
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
print("Enter some text that you want to speak >")
text = input()
speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()
if speech_synthesis_result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
print("Speech synthesized for text [{}]".format(text))
elif speech_synthesis_result.reason == speechsdk.ResultReason.Canceled:
cancellation_details = speech_synthesis_result.cancellation_details
print("Speech synthesis canceled: {}".format(cancellation_details.reason))
if cancellation_details.reason == speechsdk.CancellationReason.Error:
if cancellation_details.error_details:
print("Error details: {}".format(cancellation_details.error_details))
print("set the resource key and region values?")
输出:-