我正在使用 google voice-to-text-v2 API 通过 Chirp 模型进行转录。
我正在尝试转录一个超过 10 分钟的音频文件。由于本地文件有 10MB 的限制,因此我将文件上传到存储桶并尝试从该存储桶进行转录。
def transcribe_audio(audio, language_code):
start = time.time()
# Authenticating clients
client_options_var = client_options.ClientOptions(
api_endpoint="us-central1-speech.googleapis.com"
)
storage_client = storage.Client(credentials=credentials)
speech_client = speech_v2.SpeechClient(client_options=client_options_var, credentials=credentials)
gcs = GCStorage(storage_client=storage_client)
# Creating bucket if not present
bucket_name = 'text-stores'
if not bucket_name in gcs.list_buckets():
bucket_gcs = gcs.create_bucket(bucket_name=bucket_name)
else:
bucket_gcs = gcs.get_bucket(bucket_name=bucket_name)
# Uploading audio file to audio-files folder
audio_destination = f'audio-files/{audio}'
audio_path = f'{os.getcwd()}/{audio}'
gcs.upload_to_bucket(bucket=bucket_gcs, blob_destination=audio_destination, file_path=audio_path)
# Transcribing audio file
transcript=''
gcs_uri = 'gs://' + bucket_name + '/' + audio_destination
config = cloud_speech.RecognitionConfig(
auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
language_codes=[language_code],
model='chirp',
)
request = cloud_speech.RecognizeRequest(
recognizer="projects/tpus-302411/locations/us-central1/recognizers/my-chirp-recognizer",
config=config,
uri=gcs_uri,
)
response = speech_client.recognize(request=request)```
ERROR: google.api_core.exceptions.InvalidArgument: 400 Request audio can be a maximum of 10485760 bytes.
根据 chirp 文档, 对于超过 1 分钟的文件,我们应该使用
Speech.BatchRecognize
而不是 Speech.Recognize
。
Chirp 处理语音的块比其他模型大得多。这意味着它可能不适合真正的实时使用。 Chirp 可通过以下 API 方法获得:
- v2 Speech.Recognize(适合短音频< 1 min)
- v2 Speech.BatchRecognize(适合 1 分钟到 8 小时的长音频)
Chirp 在以下 API 方法上不可用:
- v2 语音.StreamingRecognize
- v1 语音.StreamingRecognize
- v1 语音识别
- v1 语音.LongRunningRecognize
- v1p1beta1 语音.StreamingRecognize
- v1p1beta1 语音识别
- v1p1beta1 Speech.LongRunningRecognize
我已经通过使用
python-docs-samples中的
BatchRecognize
示例使其正常工作
file_metadata = cloud_speech.BatchRecognizeFileMetadata(uri=gcs_uri)
request = cloud_speech.BatchRecognizeRequest(
recognizer=f"projects/{project_id}/locations/global/recognizers/_",
config=config,
files=[file_metadata],
recognition_output_config=cloud_speech.RecognitionOutputConfig(
inline_response_config=cloud_speech.InlineOutputConfig(),
),
)
# Transcribes the audio into text
operation = client.batch_recognize(request=request)
print("Waiting for operation to complete...")
response = operation.result(timeout=120)
for result in response.results[gcs_uri].transcript.results:
print(f"Transcript: {result.alternatives[0].transcript}")