我正在开发一种工具来转录我所拥有的合同的采访内容。为此,我开发了一个具有以下流程的代码:
目前,步骤1工作正常,步骤3还有待验证。我目前在流程的第二部分遇到问题,因为显然 google.resumable_media 无法识别我给出的一些参数。
控制台输出:
You are using credentials from (***********.json). Continue? (y/n) y
You are using the following audio file (audio\*********.m4a). Continue? (y/n) y
You are using the following destination file (audio\audio.wav). Continue? (y/n) y
Converting m4a file...
Fetching audio from local storage...
2024-01-14 17:24:19,872 - DEBUG - subprocess.call(['ffmpeg', '-y', '-f', 'mp4', '-i', 'audio\\Parcours TI-12_Zoom_2023-11-13.m4a', '-acodec', 'pcm_s16le', '-vn', '-f', 'wav', '-'])
subprocess.call(['ffmpeg', '-y', '-f', 'mp4', '-i', 'audio\\***********.m4a', '-acodec', 'pcm_s16le', '-vn', '-f', 'wav', '-'])
2024-01-14 17:24:24,599 - DEBUG - Converted retries value: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None)
Converted retries value: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None)
Error during cloud upload: __init__() got an unexpected keyword argument 'total_size'
Cloud upload failed. Unable to transcribe.
PS C:\Users\nicol\Documents\Work\IRSST Transcriptions 2024>
有问题的片段:
@staticmethod
def send_to_gcs(audio_file, destination_uri):
"""
Send the converted WAV file to the GCS bucket.
Parameters:
- audio_file: Audio file path.
- destination_uri: Destination URI.
"""
try:
# Create a storage client
storage_client = storage.Client.from_service_account_json(GOOGLE_CREDENTIALS_PATH)
# Configure resumable upload with a timeout
upload_url = f"https://storage.googleapis.com/upload/storage/v1/b/{BUCKET_NAME}/o?uploadType=resumable"
request_method = "POST"
chunk_size = 10 * 1024 * 1024 # Adjust as needed
total_size = os.path.getsize(audio_file)
timeout_seconds = 5 * 60 # Adjust as needed
upload = google.resumable_media.requests.upload.ResumableUpload(
upload_url,
chunk_size=chunk_size,
total_size=total_size,
request_method=request_method,
session=storage_client._http,
timeout=timeout_seconds,
)
# Open the audio file and perform the upload
with open(audio_file, "rb") as audio_data:
upload.initiate(session=storage_client._http)
upload.transmit(
audio_data,
callback=common.resumable_upload_callback,
session=storage_client._http,
)
print(f"File uploaded to {destination_uri}")
return destination_uri
except Exception as e:
print(f"Error during cloud upload: {e}")
return None
我正在考虑使用较新版本的 google.resumable_media 但我担心搞砸了,我已经修改这段代码好几天了,我想尽快开始转录。