编排批量请求 - 文档 AI - GCP

Question

我必须通过 Document AI 处理 PDF 文档，我正在尝试批处理，但它只允许我根据请求处理 50 个文档，如果我所有的请求，我已经想不出如何为每个请求处理大量 50 个文件文件位于存储桶中的同一文件夹中。

我正在尝试从扫描的文档中提取信息，大约 800

Answer 1

我将 50 个文件分成批次，然后我按批次处理它们，现在的问题是现在我得到了

InvalidArgument: 400 Request contains an invalid argument. [field_violations {
   field: "Maximum number of input documents that can be specified in a request is restricted to 50"
}

即使我只处理了 50 个文件，但由于某些原因重复处理了一些文件。

将文件放入文件夹：

from google.cloud import storage

bucket_name = "docstts"
client = storage.Client()
bucket = client.get_bucket(bucket_name)
blobs = bucket.list_blobs()

file_paths = []
for blob in blobs:
    file_paths.append(blob.name)

# Divide los archivos en grupos de 50
file_groups = [file_paths[x:x+50] for x in range(0, len(file_paths), 50)]

# Mueve los archivos en grupos de 50 a diferentes carpetas
for i, file_group in enumerate(file_groups):
    folder_name = f"lote{i}"
    for file_path in file_group:
        blob = bucket.blob(file_path)
        filename = blob.name.split("/")[-1]
        new_blob = bucket.copy_blob(blob, bucket, f"folder_name/{filename}")
        blob.delete()

编排批量请求 - 文档 AI - GCP

问题描述投票：0回答：1

1个回答

最新问题

编排批量请求 - 文档 AI - GCP

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1