我正在从事的项目遇到麻烦。
我有一个CSV文件,第一列中包含所有网址。
我下面的脚本当前进入并遍历每一行,但是一旦尝试find_all,它就会准备以下错误:IndexError:列表索引超出范围。
import requests
from bs4 import BeautifulSoup
import csv
with open('1.csv', "r", newline="") as inFile, open("1output.csv", "w", newline="") as outFile:
next(inFile)
reader = csv.reader(inFile)
writer = csv.writer(outFile)
for row in reader:
subURL = row[0]
# Parse the HTML from the website
URL = 'https://www.example.com/{}'.format(subURL)
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
# find iframe on webpage and get the src of the iframe
iframeDesc = soup.find_all('iframe')[0]
pageDesc = requests.get(iframeDesc['src'])
soupDesc = BeautifulSoup(pageDesc.content, 'html.parser')
# Get Description from iframe Desc
itemDesc = soupDesc.find_all('div', id="div_01")
此行发生错误:
iframeDesc = soup.find_all('iframe')[0]
可能有多种动机解决您的问题,让我最有可能向您介绍。
此外,我怀疑您正在树中寻找错误的节点。实际上,在使用BS时,这种情况经常发生,因为您基本上会陷入DOM之中,并且确实有可能丢失标签。只需在代码周围放置一些打印件,以查看这些行的内容。