我想保存我已经从红豆杉纽约时报网页报废的数据,到一个txt文件。
import urllib.request
from bs4 import BeautifulSoup
# URL
html_page = 'https://www.nytimes.com/'
page = urllib.request.urlopen(html_page)
soup = BeautifulSoup(page, "html.parser")
title_box = soup.findAll("h2", class_= "css-bzeb53 esl82me2")
print(title_box)
# Extract titles from list
titles = []
for occurence in title_box:
titles.append(occurence.text.strip())
print(titles)
正常工作了这一点,但我不能设法创建/保存数据到一个txt文件。
# Save the Headlines
filename = '/home/stephan/Documents/NYHeads.txt'
with open(filename, 'w') as file_object:
file_object.write(titles)
问题是,当你试图写入文件,它必须是一个字符串。在你的程序titles
是一个列表。您需要titles
转换为字符串。这应该工作:
filename = '/home/stephan/Documents/NYHeads.txt'
with open(filename, 'w') as file_object:
file_object.write(str(titles))