400 通过 GCStorage URI 在语音转文本 v2 中请求音频最多可达 10485760 字节

问题描述 投票:0回答:1

我正在使用 google voice-to-text-v2 API 通过 Chirp 模型进行转录。

我正在尝试转录一个超过 10 分钟的音频文件。由于本地文件有 10MB 的限制,因此我将文件上传到存储桶并尝试从该存储桶进行转录。

def transcribe_audio(audio, language_code):
    start = time.time()
    # Authenticating clients
    client_options_var = client_options.ClientOptions(
        api_endpoint="us-central1-speech.googleapis.com"
    )
    storage_client = storage.Client(credentials=credentials)
    speech_client = speech_v2.SpeechClient(client_options=client_options_var, credentials=credentials)

    gcs = GCStorage(storage_client=storage_client)

    # Creating bucket if not present
    bucket_name = 'text-stores'
    if not bucket_name in gcs.list_buckets():
        bucket_gcs = gcs.create_bucket(bucket_name=bucket_name)
    else:
        bucket_gcs = gcs.get_bucket(bucket_name=bucket_name)
    
    # Uploading audio file to audio-files folder
    audio_destination = f'audio-files/{audio}'
    audio_path = f'{os.getcwd()}/{audio}'

    gcs.upload_to_bucket(bucket=bucket_gcs, blob_destination=audio_destination, file_path=audio_path)

    # Transcribing audio file

    transcript=''
    gcs_uri = 'gs://' + bucket_name + '/' + audio_destination

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=[language_code],
        model='chirp',
    )

    request = cloud_speech.RecognizeRequest(
        recognizer="projects/tpus-302411/locations/us-central1/recognizers/my-chirp-recognizer",
        config=config,
        uri=gcs_uri,
    )

    response = speech_client.recognize(request=request)```

ERROR: google.api_core.exceptions.InvalidArgument: 400 Request audio can be a maximum of 10485760 bytes.
google-cloud-platform google-cloud-storage google-api-python-client google-speech-api google-speech-to-text-api
1个回答
0
投票

根据 chirp 文档, 对于超过 1 分钟的文件,我们应该使用

Speech.BatchRecognize
而不是
Speech.Recognize

Chirp 处理语音的块比其他模型大得多。这意味着它可能不适合真正的实时使用。 Chirp 可通过以下 API 方法获得:

  • v2 Speech.Recognize(适合短音频< 1 min)
  • v2 Speech.BatchRecognize(适合 1 分钟到 8 小时的长音频)

Chirp 在以下 API 方法上不可用:

  • v2 语音.StreamingRecognize
  • v1 语音.StreamingRecognize
  • v1 语音识别
  • v1 语音.LongRunningRecognize
  • v1p1beta1 语音.StreamingRecognize
  • v1p1beta1 语音识别
  • v1p1beta1 Speech.LongRunningRecognize

我已经通过使用

python-docs-samples
中的 BatchRecognize

示例使其正常工作
file_metadata = cloud_speech.BatchRecognizeFileMetadata(uri=gcs_uri)

request = cloud_speech.BatchRecognizeRequest(
    recognizer=f"projects/{project_id}/locations/global/recognizers/_",
    config=config,
    files=[file_metadata],
    recognition_output_config=cloud_speech.RecognitionOutputConfig(
        inline_response_config=cloud_speech.InlineOutputConfig(),
    ),
)

# Transcribes the audio into text
operation = client.batch_recognize(request=request)

print("Waiting for operation to complete...")
response = operation.result(timeout=120)

for result in response.results[gcs_uri].transcript.results:
    print(f"Transcript: {result.alternatives[0].transcript}")
© www.soinside.com 2019 - 2024. All rights reserved.