Python Newspaper函数在循环期间不读取文章URL?

问题描述 投票:0回答:1

抱歉,如果这是一个愚蠢的问题-我是Python新手,对excel VBA更加熟悉。

我试图让Python遍历excel文档中的多个文章URL,并创建各种URL的摘要。目标是将标题,摘要和URL文章导出到新的Excel(或其他选项卡)。 (最终目标是抓取相关新闻并进行总结,但我正在为此努力!)

但是,我遇到了使报纸文章功能读取从我创建的列表传递的URL的问题。当我打印URL时,其外观与刚复制粘贴并设置url ='复制粘贴值'时的外观完全相同。但是,当我在该URL上运行“文章”功能时,它似乎无法正确读取URL。它们以字符串形式存储在列表中。不知道我可能在做什么错。任何帮助将不胜感激!

# Import the libraries
import nltk
from newspaper import Article
import openpyxl

# import the URLs from the Excel
from openpyxl import load_workbook
wb = load_workbook(r'C:\Users\Python\RunPythonScript.xlsm')  # Work Book
ws = wb.get_sheet_by_name('URLs')  # Work Sheet
column = ws['A']  # Column
column_list = [column[x].value for x in range(len(column))] # create a list
url_list = list(filter(None, column_list)) # remove blanks
url_list.pop(0) # remove title

# start loop
x = 0
while x < len(url_list):


   url = str("'" + url_list[x] + "'") # set url  
   article = Article(url) # Get the article ### seems to be where error is ###
   print(article)

   x = x + 1 # move to next url

我从python得到以下输出:

<newspaper.article.Article object at 0x07DADB38>
<newspaper.article.Article object at 0x0A698670>
<newspaper.article.Article object at 0x07DADB38>
<newspaper.article.Article object at 0x0A698670>
<newspaper.article.Article object at 0x07DADB38>
<newspaper.article.Article object at 0x0A698670>
<newspaper.article.Article object at 0x07DADB38>
<newspaper.article.Article object at 0x0A698670>
<newspaper.article.Article object at 0x07DADB38>
<newspaper.article.Article object at 0x0A698670>

而不是打印文章,它似乎在URL上出错了。

任何见解?预先感谢!

python nltk python-3.7 python-3.8 python-newspaper
1个回答
0
投票

在对象上调用print()时,通过调用对象的str方法来创建对象的字符串表示形式。

[如果您需要从Article打印一些数据,例如它的URL,请执行:

print(article.url)

有关Article的更多信息,请点击:https://newspaper.readthedocs.io/en/latest/

© www.soinside.com 2019 - 2024. All rights reserved.