尝试在 python 中读取我的 CSV 文件时出错

问题描述 投票:0回答:0

我正在尝试制作一个 python 程序,它采用一个 CSV 文件,该文件在每一行中包含一个链接(链接在最后一行)。当我尝试运行我的程序时,每行都会出现此错误:“文章

download()
失败,没有找到连接适配器”。 代码是:

import csv
from newspaper import Article
import spacy
import requests

nlp = spacy.load("en_core_web_sm")

URL_COLUMN_INDEX = 4

OUTPUT_FILE_PATH = "output.csv"

visited_urls = set()

with open("20230327113000.export.CSV", "r", encoding="ISO-8859-1") as infile, open(OUTPUT_FILE_PATH, "w", newline="") as outfile:
    reader = csv.reader(infile)
    writer = csv.writer(outfile)

    for row in reader:
        if reader.line_num == 1:
            continue

        url = row[-1]

        if url in visited_urls:
            continue

        try:
            article = Article(url)
            article.download()
            article.parse()
        except Exception as e:
            print(f"Error processing article: {url}")
            print(e)
            continue

        summary = article.summary
        doc = nlp(summary)
        entities = [ent.text for ent in doc.ents]

        output_row = [url] + entities
        writer.writerow(output_row)

        visited_urls.add(url)

        print(f"Processed article: {url}")

我尝试使用报纸和 beautifulSoup 制作一个程序,结果相同。

python csv python-newspaper
© www.soinside.com 2019 - 2024. All rights reserved.