使用 Python .long_running_recognize() 方法对 Google Speech-to-Text API 说话人进行二值化

Question

我正在关注这个问题中的答案。但我的音频超过 1 分钟，所以我必须使用

.long_running_recognize(config, audio)

方法来代替

.recognize(config, audio)

。这是代码：

from pathlib import Path
#  https://cloud.google.com/python/docs/reference/speech/latest/google.cloud.speech_v1p1beta1.services.speech.SpeechClient
from google.cloud import speech_v1p1beta1 as speech
from google.cloud import storage

def file_upload(client, file: Path, bucket_name: str = 'wav_files_ua_eu_standard'):
    # https://stackoverflow.com/questions/62125584/file-upload-using-pythonlocal-system-to-google-cloud-storage#:~:text=You%20can%20do%20it%20in%20this%20way%2C%20from,string%20of%20text%20blob.upload_from_string%20%28%27this%20is%20test%20content%21%27%29

    bucket = client.get_bucket(bucket_name)
    blob = bucket.blob(file.name)
    # Uploading from local file without open()
    blob.upload_from_filename(file)
    # https://cloud.google.com/python/docs/reference/storage/latest/google.cloud.storage.blob.Blob   
    uri3 = 'gs://' + blob.id[:-(len(str(blob.generation)) + 1)]
    print(F"{uri3=}")
    return uri3


client = speech.SpeechClient()
client_bucket = storage.Client(project='my-project-id-is-hidden')
speech_file_name = R"C:\Users\vasil\OneDrive\wav_samples\wav_sample_phone_call.wav"
speech_file = Path(speech_file_name)

if speech_file.exists:
    uri = file_upload(client_bucket, speech_file)

# audio = speech.RecognitionAudio(content=content)  
audio = speech.RecognitionAudio(uri=uri) 

diarization_config = speech.SpeakerDiarizationConfig(
    enable_speaker_diarization=True,
    min_speaker_count=2,
    max_speaker_count=3,
)

config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    sample_rate_hertz=8000,
    language_code="ru-RU",   # "uk-UA",  "ru-RU",
    # alternative_language_codes=["uk-UA", ],
    diarization_config=diarization_config,
)

print("Waiting for operation to complete...")
# response = client.recognize(config=config, audio=audio)
response = client.long_running_recognize(config=config, audio=audio)
   
words_info = result.results

# Printing out the output:
for word_info in words_info[0].alternatives[0].words:
    print(f"word: '{word_info.word}', speaker_tag: {word_info.speaker_tag}")

差异在于

我必须上传文件进行识别并获取上传文件的URI

使用

speech.RecognitionAudio(uri=uri

) - 不

.RecognitionAudio(content=content)

使用

client.long_running_recognize(config=config, audio=audio)

- 不使用

client.recognize(config=config, audio=audio)

所以代码正在工作 - 但是...结果没有有关分类标签的信息... 我做错了什么？这是输出，speaker_tag 始终等于 0。

word: 'Алло', speaker_tag: 0
word: 'здравствуйте', speaker_tag: 0
word: 'Я', speaker_tag: 0
word: 'хочу', speaker_tag: 0
word: 'котёнок', speaker_tag: 0
word: 'Ты', speaker_tag: 0
word: 'очень', speaker_tag: 0
word: 'классная', speaker_tag: 0
word: 'Спасибо', speaker_tag: 0
word: 'приятно', speaker_tag: 0
word: 'что', speaker_tag: 0
word: 'вы', speaker_tag: 0
word: 'и', speaker_tag: 0
word: 'Хорошего', speaker_tag: 0
word: 'вам', speaker_tag: 0
word: 'дня', speaker_tag: 0
word: 'сегодня', speaker_tag: 0
word: 'Спасибо', speaker_tag: 0
word: 'до', speaker_tag: 0
word: 'свидания', speaker_tag: 0

Answer 1

问题仅出现在声明中的 1 个数字中

for word_info in words_info[0].alternatives[0].words:

正确的做法是：

for word_info in words_info[-1].alternatives[0].words:

一切正常。

使用 Python .long_running_recognize() 方法对 Google Speech-to-Text API 说话人进行二值化

问题描述投票：0回答：1

1个回答

最新问题

使用 Python .long_running_recognize() 方法对 Google Speech-to-Text API 说话人进行二值化

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1