Python:如何使用特定文本抓取 head 标签内列表标签的内容

问题描述 投票:0回答:1

导入请求和 BeautifulSoup

我想抓取“目标”部分,但出现如下错误。

AttributeError:“NoneType”对象没有属性“next_sibling”

另外,我想为每节课制作 csv 表。

import requests
from bs4 import BeautifulSoup

url = "https://studio.code.org/s/web-development-2023/lessons/1"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
element = soup.find(text="Students will be able to:")
text = element.next_sibling.get_text()

print(text)
python web-scraping beautifulsoup attributeerror nonetype
1个回答
0
投票

您在网页中看到的文本以 Json 形式编码在

<script>
元素的属性中(因此 BeautifulSoup 看不到它)。为了实现您可以做到的目标:

import json

import requests
from bs4 import BeautifulSoup

url = "https://studio.code.org/s/web-development-2023/lessons/1"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

data = json.loads(soup.select_one("[data-lesson]")["data-lesson"])

for o in data["objectives"]:
    print(o["description"])

打印

Create a prototype of a web design to meet the needs of a user using the problem-solving process
Identify features of a web design that match the needs of users
Understand the steps of the problem-solving process
© www.soinside.com 2019 - 2024. All rights reserved.