Google Cloud Storage:__init__() 获得意外的关键字参数“total_size”

问题描述 投票:0回答:1

我正在开发一种工具来转录我所拥有的合同的采访内容。为此,我开发了一个具有以下流程的代码:

  1. 输入验证后,音频文件(m4a)将转换为 wav 并存储在本地。
  2. 然后,WAV 文件将发送到我的 Google Cloud Storage 存储桶。
  3. 最后转录GCS内的WAV文件。

目前,步骤1工作正常,步骤3还有待验证。我目前在流程的第二部分遇到问题,因为显然 google.resumable_media 无法识别我给出的一些参数。

控制台输出:

You are using credentials from (***********.json). Continue? (y/n) y
You are using the following audio file (audio\*********.m4a). Continue? (y/n) y
You are using the following destination file (audio\audio.wav). Continue? (y/n) y
Converting m4a file...
Fetching audio from local storage...
2024-01-14 17:24:19,872 - DEBUG - subprocess.call(['ffmpeg', '-y', '-f', 'mp4', '-i', 'audio\\Parcours TI-12_Zoom_2023-11-13.m4a', '-acodec', 'pcm_s16le', '-vn', '-f', 'wav', '-'])
subprocess.call(['ffmpeg', '-y', '-f', 'mp4', '-i', 'audio\\***********.m4a', '-acodec', 'pcm_s16le', '-vn', '-f', 'wav', '-'])
2024-01-14 17:24:24,599 - DEBUG - Converted retries value: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None)
Converted retries value: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None)
Error during cloud upload: __init__() got an unexpected keyword argument 'total_size'
Cloud upload failed. Unable to transcribe.
PS C:\Users\nicol\Documents\Work\IRSST Transcriptions 2024> 

有问题的片段:

    @staticmethod
    def send_to_gcs(audio_file, destination_uri):
        """
        Send the converted WAV file to the GCS bucket.

        Parameters:
        - audio_file: Audio file path.
        - destination_uri: Destination URI.
        """
        try:
            # Create a storage client
            storage_client = storage.Client.from_service_account_json(GOOGLE_CREDENTIALS_PATH)

            # Configure resumable upload with a timeout
            upload_url = f"https://storage.googleapis.com/upload/storage/v1/b/{BUCKET_NAME}/o?uploadType=resumable"
            request_method = "POST"
            chunk_size = 10 * 1024 * 1024  # Adjust as needed
            total_size = os.path.getsize(audio_file)
            timeout_seconds = 5 * 60  # Adjust as needed

            upload = google.resumable_media.requests.upload.ResumableUpload(
                upload_url,
                chunk_size=chunk_size,
                total_size=total_size,
                request_method=request_method,
                session=storage_client._http,
                timeout=timeout_seconds,
            )

            # Open the audio file and perform the upload
            with open(audio_file, "rb") as audio_data:
                upload.initiate(session=storage_client._http)
                upload.transmit(
                    audio_data,
                    callback=common.resumable_upload_callback,
                    session=storage_client._http,
                )

            print(f"File uploaded to {destination_uri}")
            return destination_uri
        except Exception as e:
            print(f"Error during cloud upload: {e}")
            return None

我正在考虑使用较新版本的 google.resumable_media 但我担心搞砸了,我已经修改这段代码好几天了,我想尽快开始转录。

python google-cloud-platform google-cloud-storage speech-to-text
1个回答
0
投票

正如 deceze 评论的那样,ResumableUpload 不支持total_size 和 timeout 参数。 API 文档具有以下签名:

ResumableUpload(upload_url, chunk_size, checksum=None, headers=None)

您可能需要使用 initiate 方法来配置

total_bytes
timeout
配置。

虽然有点过时,但是这里是使用ResumableUpload的一个很好的例子。

© www.soinside.com 2019 - 2024. All rights reserved.