我有这个循环:
for index, row in df.iterrows():
process_row(index, row)
其中 process_row 是调用两次 API 的方法。
def process_row(index, row):
print("Evaluating row index:", index)
question = row["Question"]
answer = row["Answer"]
instruct = "..."
instruct2 = "..."
try:
completion = openai.ChatCompletion.create(
model="gpt-3.5-turbo", messages=[{"role": "user", "content": instruct}]
)
response = completion["choices"][0]["message"]["content"]
completion = openai.ChatCompletion.create(
model="gpt-3.5-turbo", messages=[{"role": "user", "content": instruct2}]
)
response2 = completion["choices"][0]["message"]["content"]
.... OTHER CODE ....
except Exception as e:
print(e)
我希望如果整个方法的迭代时间超过 30 秒,它会执行以下操作:
min_vote = 10
row_with_vote = row.tolist() + [min_vote]
passed_writer.writerow(row_with_vote)
我该怎么做?我尝试了并发。期货,但我没有看到任何改进,但如果你愿意,我可以将其添加到帖子中。我看过其他帖子,但他们在每条指令后都会进行检查,而我很确定在我的情况下它不会解决,因为程序卡在一行上。此外,什么原因会使该方法如此缓慢?大多数迭代只需要几秒钟,而有时需要 10 分钟或更长时间,因此会出现问题。
从这个答案中提取,尝试使用
signal
包来定义超时。
import signal
def signal_handler(signum, frame):
raise Exception("Timed out!")
def long_function_call():
while True:
pass
signal.signal(signal.SIGALRM, signal_handler)
signal.alarm(3) # Three seconds
try:
long_function_call()
except:
print("Timed out!")
所以你的代码可能看起来像这样:
import signal
import time
import pandas as pd
#dummy function
def process_row(index, row):
time.sleep(index)
print(f"Processed index {index}")
# dummy data
df = pd.DataFrame(columns=["a"], index=range(10))
def signal_handler(signum, frame):
raise Exception("Timed out!")
for index, row in df.iterrows():
signal.signal(signal.SIGALRM, signal_handler)
signal.alarm(5) # 5 second timeout
try:
process_row(index, row)
except:
print("Timed out!")
Processed index 0
Processed index 1
Processed index 2
Processed index 3
Processed index 4
Timed out!
Timed out!
Timed out!
Timed out!
Timed out!