[我正在尝试编写一个调用Google Translation API
的脚本,以翻译具有1000行的Excel文件中的每一行。
我正在使用pandas
加载和读取特定值中的值,然后将数据框附加到列表中,然后使用Google API
进行翻译:
import os
from google.cloud import translate_v2 as translate
import pandas as pd
from datetime import datetime
# Variable for GCP service account credentials
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = r'path to credentials json'
# Path to the file
filepath = r'../file.xlsx'
# Instantiate the Google Translation API Client
translate_client = translate.Client()
# Read all the information from the Excel file within 'test' sheet name
df = pd.read_excel(filepath, sheet_name='test')
# Define an empty list
elements = []
# Loop the data frame and append the list
for i in df.index:
elements.append(df['EN'][i])
# Loop the list and translate each line
for item in elements:
output = translate_client.translate(
elements,
target_language='fr'
)
result = [
element['translatedText'] for element in output
]
print("The values corresponding to key : " + str(result))
在我追加到列表之后,元素的总数将为1000。Google Translation API
的问题在于,如果您发送多个段,他们会调用它,它将返回以下错误:
400 POST https://translation.googleapis.com/language/translate/v2:文本段太多
我已经进行了调查,发现发送100条线路(以我的情况为例)将是一个解决方案。现在我有点卡住了。
[我将如何编写循环以一次迭代100行,转换那100行然后对结果做些什么,然后再处理其他100行,依此类推直到结束为止的循环?
假设您能够将列表传递到单个转换调用中,也许您可以执行类似的操作:
# Define a helper to step thru the list in chunks
def chunker(seq, size):
return (seq[pos : pos + size] for pos in range(0, len(seq), size))
# Then iterate and handle them accordignly
output = []
for chunk in chunker(elements, 100):
temp = translate_client.translate(
chunk,
target_language='fr'
)
output.extend(temp)