如何使用 OpenAI API 从文档中提取数据? [已关闭]

问题描述 投票:0回答:1

我想从租赁协议中提取关键条款。

为此,我想将合同的 PDF 发送到 AI 服务,该服务必须以 JSON 格式返回一些关键条款。

有哪些不同的图书馆和公司可以做到这一点?到目前为止,我已经探索了 OpenAI API,但它并不像我想象的那么简单。

使用ChatGPT接口时,它运行得很好,所以我认为使用API应该同样简单。

看来我需要先阅读PDF文本,然后将文本发送到OpenAI API。

任何其他实现这一目标的想法将不胜感激。

python artificial-intelligence openai-api openai-assistants-api
1个回答
1
投票

您要使用的是 Assistants API

截至今天,有 3 个工具可用:

您需要使用知识检索工具。正如官方OpenAI文档中所述:

检索增强了助手的外部知识 模型,例如专有产品信息或提供的文档 由您的用户。文件上传并传递给助手后, OpenAI 将自动对您的文档进行分块、索引并存储 嵌入,并实现矢量搜索来检索相关内容 回答用户的疑问。

我过去构建过一个客户支持聊天机器人。以此为例。就您而言,您希望助手使用您的 PDF 文件(我使用的是

knowledge.txt
文件)。看看我的 GitHubYouTube

customer_support_chatbot.py

import os
from openai import OpenAI
client = OpenAI()
OpenAI.api_key = os.getenv('OPENAI_API_KEY')

# Step 1: Upload a File with an "assistants" purpose
my_file = client.files.create(
  file=open("knowledge.txt", "rb"),
  purpose='assistants'
)
print(f"This is the file object: {my_file} \n")

# Step 2: Create an Assistant
my_assistant = client.beta.assistants.create(
    model="gpt-3.5-turbo-1106",
    instructions="You are a customer support chatbot. Use your knowledge base to best respond to customer queries.",
    name="Customer Support Chatbot",
    tools=[{"type": "retrieval"}]
)
print(f"This is the assistant object: {my_assistant} \n")

# Step 3: Create a Thread
my_thread = client.beta.threads.create()
print(f"This is the thread object: {my_thread} \n")

# Step 4: Add a Message to a Thread
my_thread_message = client.beta.threads.messages.create(
  thread_id=my_thread.id,
  role="user",
  content="What can I buy in your online store?",
  file_ids=[my_file.id]
)
print(f"This is the message object: {my_thread_message} \n")

# Step 5: Run the Assistant
my_run = client.beta.threads.runs.create(
  thread_id=my_thread.id,
  assistant_id=my_assistant.id,
  instructions="Please address the user as Rok Benko."
)
print(f"This is the run object: {my_run} \n")

# Step 6: Periodically retrieve the Run to check on its status to see if it has moved to completed
while my_run.status in ["queued", "in_progress"]:
    keep_retrieving_run = client.beta.threads.runs.retrieve(
        thread_id=my_thread.id,
        run_id=my_run.id
    )
    print(f"Run status: {keep_retrieving_run.status}")

    if keep_retrieving_run.status == "completed":
        print("\n")

        # Step 7: Retrieve the Messages added by the Assistant to the Thread
        all_messages = client.beta.threads.messages.list(
            thread_id=my_thread.id
        )

        print("------------------------------------------------------------ \n")

        print(f"User: {my_thread_message.content[0].text.value}")
        print(f"Assistant: {all_messages.data[0].content[0].text.value}")

        break
    elif keep_retrieving_run.status == "queued" or keep_retrieving_run.status == "in_progress":
        pass
    else:
        print(f"Run status: {keep_retrieving_run.status}")
        break
© www.soinside.com 2019 - 2024. All rights reserved.