我正在尝试制作一个 python 程序,它采用一个 CSV 文件,该文件在每一行中包含一个链接(链接在最后一行)。当我尝试运行我的程序时,每行都会出现此错误:“文章
download()
失败,没有找到连接适配器”。
代码是:
import csv
from newspaper import Article
import spacy
import requests
nlp = spacy.load("en_core_web_sm")
URL_COLUMN_INDEX = 4
OUTPUT_FILE_PATH = "output.csv"
visited_urls = set()
with open("20230327113000.export.CSV", "r", encoding="ISO-8859-1") as infile, open(OUTPUT_FILE_PATH, "w", newline="") as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
for row in reader:
if reader.line_num == 1:
continue
url = row[-1]
if url in visited_urls:
continue
try:
article = Article(url)
article.download()
article.parse()
except Exception as e:
print(f"Error processing article: {url}")
print(e)
continue
summary = article.summary
doc = nlp(summary)
entities = [ent.text for ent in doc.ents]
output_row = [url] + entities
writer.writerow(output_row)
visited_urls.add(url)
print(f"Processed article: {url}")
我尝试使用报纸和 beautifulSoup 制作一个程序,结果相同。