如何在网络爬网时忽略div类中的几个元素

问题描述 投票:0回答:1

我试图通过网页抓取一个网站,但我想忽略div类中的一些元素

r = requests.get(
            f"https://www.ranger5g.com/forum/threads/pre-collision-assist.3239")

soup = BeautifulSoup(r.text, 'html.parser')

data=[]

for div in soup.findAll("div", class_="bbWrapper"):
            try:
                div.find('blockquote', class_="bbCodeBlock bbCodeBlock--expandable bbCodeBlock--quote").extract()                  
            except AttributeError:
                 pass  
            try:
                div.find('bbCodeBlock-content').extract()
            except AttributeError:
                pass
            try:
                div.find("aside", class_="message-signature").extract()
            except AttributeError:
                pass
            result = [div.get_text(strip=True, separator=" ")]
            data.append(result)

我的数据输出[2]应该给出如下

Subaru dealer by me uses an orange construction cone for demo. Find one and try it. Won’t hurt anything if it doesn’t work.

但是它给出了先前消息中的所有内容。我该如何忽略class _ =“ message-signature”中的元素,如何获取此信息。预先谢谢你

python web-scraping beautifulsoup
1个回答
0
投票
import requests
from bs4 import BeautifulSoup


def Main(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.content, 'html.parser')
    target = soup.findAll(
        "div", class_="bbCodeBlock-expandContent")[2].get_text(strip=True)
    print(target)


Main("https://www.ranger5g.com/forum/threads/pre-collision-assist.3239/")

输出:

Subaru dealer by me uses an orange construction cone for demo. Find one and try it. Won’t hurt anything if it doesn’t work. 
© www.soinside.com 2019 - 2024. All rights reserved.