运行 CrewAI 时获取“ValueError Encountered text对应于不允许的特殊标记‘<|endoftext|>’”

问题描述 投票:0回答:1

我正在使用 Ollama Phi3 模型运行 CrewAI,只要不要求我生成示例 Python 代码,该模型就可以正常运行。当开发者代理将响应传递给 HumanProxy 代理时,事情会变得复杂。 Crew 永远不会完成工作并不断产生以下输出:

2024-05-02 14:24:07,361 - 129651365437888 - manager.py-manager:281 - 警告:TokenCalcHandler.on_llm_start回调中出现错误:ValueError(“遇到与不允许的特殊标记'<|endoftext|>'相对应的文本。 如果您希望将此文本编码为特殊标记,请将其传递给

allowed_special
,例如
allowed_special={'<|endoftext|>', ...}
。 如果您希望将此文本编码为普通文本,请通过传递
disallowed_special=(enc.special_tokens_set - {'<|endoftext|>'})
禁用对此标记的检查。 要禁用对所有特殊标记的此检查,请传递
disallowed_special=()
。 ”)

运行 Crew 的输出:

>>> result = crew.kickoff()
 [DEBUG]: == Working Agent: Senior Manager
 [INFO]: == Starting Task: Create python script that scrape the websites and saves the output into the text file fo further processing.
  Manage the work with your crew to answer the request. 
  Check with Human if the response was helpful and if it is not satisfactory then try again. 


> Entering new CrewAgentExecutor chain...
Thought: I need to delegate tasks efficiently among my crew members to create a Python script for web scraping that saves output into a text file.
Action: Delegate work to co-worker
Action Input: {"coworker": "Python Developer", "task": "Create a Python script for web scraping and saving the output into a text file.", "context": "The task involves using libraries like BeautifulS Authorization (BeautifulSoup) or Scrapy to scrape websites. The data extracted should be saved in a structured format, such as JSON or CSV, which can then be processed further."}

> Entering new CrewAgentExecutor chain...
Thought: To complete this task effectively, I need to select a suitable library for web scraping, define the target website URL, specify the data elements to be extracted, and then write Python code that will scrape the desired information and save it in JSON format.

Final Answer:
```python
import requests
from bs4 import BeautifulSoup
import json

# Define the URL of the website you want to scrape
url = 'https://example.com'  # Replace with actual target website URL

def web_scrape(target_url):
    try:
        response = requests.get(target_url)
        soup = BeautifulSoup(response.text, 'html.parser')
        
        # Assuming we want to extract data from elements with a specific class or id
        data_elements = []  # Replace this with the actual logic for selecting data elements
        for element in soup.select('your-selector'):  # Replace 'your-selector' with appropriate CSS selector(s)
            extracted_data = {
                'element_id': element['id'],  # Assuming each element has an id attribute
                'text': element.get_text()
            }
            data_elements.append(extracted_data)
        
        # Save the scraped data into a JSON file
        with open('output.json', 'w') as json_file:
            json.dump(data_elements, json_file, indent=4)
            
    except requests.RequestException as e:
        print(f"An error occurred while making the request: {e}")

# Execute the web scraping function
web_scrape(url)
```
Replace `'https://example.com'` with your actual target website URL and adjust the data extraction logic according to the specific elements you need to scrape from that site.<|end|><|endoftext|>

> Finished chain.
 

```python
import requests
from bs4 import BeautifulSoup
import json

# Define the URL of the website you want to scrape
url = 'https://example.com'  # Replace with actual target website URL

def web_scrape(target_url):
    try:
        response = requests.get(target_url)
        soup = BeautifulSoup(response.text, 'html.parser')
        
        # Assuming we want to extract data from elements with a specific class or id
        data_elements = []  # Replace this with the actual logic for selecting data elements
        for element in soup.select('your-selector'):  # Replace 'your-selector' with appropriate CSS selector(s)
            extracted_data = {
                'element_id': element['id'],  # Assuming each element has an id attribute
                'text': element.get_text()
            }
            data_elements.append(extracted_data)
        
        # Save the scraped data into a JSON file
        with open('output.json', 'w') as json_file:
            json.dump(data_elements, json_file, indent=4)
            
    except requests.RequestException as e:
        print(f"An error occurred while making the request: {e}")

# Execute the web scraping function
web_scrape(url)
```
Replace `'https://example.com'` with your actual target website URL and adjust the data extraction logic according to the specific elements you need to scrape from that site.<|end|><|endoftext|>
python ollama
1个回答
0
投票

Well <|endoftext|> 是许多 llm 模型中的默认停止标记。 我认为您在提示中的某处使用了 <|endoftext|>。 从提示中删除停止令牌或在 api 调用中使用 allowed_special={'<|endoftext|>'}

© www.soinside.com 2019 - 2024. All rights reserved.