如何使用BS4获取html正文的特定部分

问题描述 投票:0回答:1

我为此的解决方案是使用soup.text抓取数据,然后使用一些正则表达式以非常手动的方式对其进行清理,然后拆分。但是,我相信有一些使用BS4命令的简便方法。

所需的输出是公司的显示名称,基本价格,折扣价格和小卖部。

URL:

https://forsikringsguiden.dk/signalr/poll?transport=longPolling&messageId=d-D7589F50-A%2C0%7CtK%2C1%7CtL%2C1%7CtM%2C0&clientProtocol=1.4&connectionToken=UEBdftp4dljKH%2Fw56kprFeB7pnlcXiEv6OqR7mKdzYGoT48tuUFIahljyCEjdyaOn%2BqbxkERLSzuO3QA%2Bwh4BrWIKWlE4rzLhzJnDPedGyOo0Yar2KU7QLCWphtOeava&connectionData=%5B%7B%22name%22%3A%22insuranceofferrequesthub%22%7D%5D&tid=4&_=1572591981773

我的代码

from bs4 import BeautifulSoup
import requests
import pandas as pd

url = "https://forsikringsguiden.dk/signalr/poll?transport=longPolling&messageId=d-D7589F50-A%2C0%7CtK%2C1%7CtL%2C1%7CtM%2C0&clientProtocol=1.4&connectionToken=UEBdftp4dljKH%2Fw56kprFeB7pnlcXiEv6OqR7mKdzYGoT48tuUFIahljyCEjdyaOn%2BqbxkERLSzuO3QA%2Bwh4BrWIKWlE4rzLhzJnDPedGyOo0Yar2KU7QLCWphtOeava&connectionData=%5B%7B%22name%22%3A%22insuranceofferrequesthub%22%7D%5D&tid=4&_=1572591981773" 
header = "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0"

def get_total_items(url):
    soup = BeautifulSoup(requests.get(url, headers={"User-Agent":header}).text, 'lxml')
    return soup.text

print(get_total_items(url))

输出:

{"C":"d-D7589F50-A,0|tK,12|tL,1|tM,0","M":[{"H":"InsuranceOfferRequestHub","M":"ReceiveNoMatch","A":[{"companyId":41,"companydisplayname":"Sønderjysk Forsikring","message":"Kan ikke matche dine behov"}]},{"H":"InsuranceOfferRequestHub","M":"ReceiveNoMatch","A":[{"companyId":33,"companydisplayname":"NEM Forsikring A/S","message":"Kan ikke matche dine behov"}]},{"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":[{"offers":[{"companyId":17,"companydisplayname":"If Skadeforsikring","produktId":236,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":88.03631578947369,"stars":9,"basicprice":4938,"discountedprice":4670,"selvrisiko":5000,"addtionalOptions":[{"sequence":1,"chosen":true,"name":"Kasko"},{"sequence":2,"chosen":false,"name":"Friskade"},{"sequence":3,"chosen":false,"name":"Udvidet glasdækning"},{"sequence":4,"chosen":false,"name":"Førerdækning"},{"sequence":5,"chosen":false,"name":"Vejhjælp"},{"sequence":6,"chosen":false,"name":"Fastpris"}]}],"basicprice":4938,"discountedprice":4670}]},{"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":[{"offers":[{"companyId":15,"companydisplayname":"Topdanmark","produktId":190,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":91.119210526315811,"stars":10,"basicprice":8360,"discountedprice":7003,"selvrisiko":3927,"addtionalOptions":[{"sequence":1,"chosen":true,"name":"Kasko"},{"sequence":2,"chosen":true,"name":"Friskade"},{"sequence":3,"chosen":false,"name":"Udvidet glasdækning"},{"sequence":4,"chosen":false,"name":"Førerdækning"},{"sequence":5,"chosen":false,"name":"Vejhjælp"},{"sequence":6,"chosen":true,"name":"Fastpris"}]}],"basicprice":8360,"discountedprice":7003}]},{"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":[{"offers":[{"companyId":18,"companydisplayname":"Alm. Brand","produktId":228,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":93.036973684210551,"stars":10,"basicprice":4252,"discountedprice":3633,"selvrisiko":6324,"addtionalOptions":[{"sequence":1,"chosen":true,"name":"Kasko"},{"sequence":2,"chosen":false,"name":"Friskade"},{"sequence":3,"chosen":false,"name":"Udvidet glasdækning"},{"sequence":4,"chosen":false,"name":"Førerdækning"},{"sequence":5,"chosen":false,"name":"Vejhjælp"},{"sequence":6,"chosen":false,"name":"Fastpris"}]}],"basicprice":4252,"discountedprice":3633}]},{"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":[{"offers":[{"companyId":43,"companydisplayname":"OK Forsikring","produktId":345,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":91.617894736842118,"stars":10,"basicprice":6473,"discountedprice":6473,"selvrisiko":4982,"addtionalOptions":[{"sequence":1,"chosen":true,"name":"Kasko"},{"sequence":2,"chosen":false,"name":"Friskade"},{"sequence":3,"chosen":false,"name":"Udvidet glasdækning"},{"sequence":4,"chosen":false,"name":"Førerdækning"},{"sequence":5,"chosen":false,"name":"Vejhjælp"},{"sequence":6,"chosen":true,"name":"Fastpris"}]}],"basicprice":6473,"discountedprice":6473}]},{"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":[{"offers":[{"companyId":20,"companydisplayname":"GF Forsikring","produktId":279,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":91.617894736842118,"stars":10,"basicprice":6737,"discountedprice":6737,"selvrisiko":4982,"addtionalOptions":[{"sequence":1,"chosen":true,"name":"Kasko"},{"sequence":2,"chosen":false,"name":"Friskade"},{"sequence":3,"chosen":false,"name":"Udvidet glasdækning"},{"sequence":4,"chosen":false,"name":"Førerdækning"},{"sequence":5,"chosen":false,"name":"Vejhjælp"},{"sequence":6,"chosen":true,"name":"Fastpris"}]}],"basicprice":6737,"discountedprice":6737}]},{"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":[{"offers":[{"companyId":10,"companydisplayname":"Nykredit Forsikring A/S","produktId":212,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":94.88828947368421,"stars":10,"basicprice":5215,"discountedprice":4707,"selvrisiko":7100,"addtionalOptions":[{"sequence":1,"chosen":true,"name":"Kasko"},{"sequence":2,"chosen":false,"name":"Friskade"},{"sequence":3,"chosen":false,"name":"Udvidet glasdækning"},{"sequence":4,"chosen":false,"name":"Førerdækning"},{"sequence":5,"chosen":false,"name":"Vejhjælp"},{"sequence":6,"chosen":true,"name":"Fastpris"}]}],"basicprice":5215,"discountedprice":4707}]},{"H":"InsuranceOfferRequestHub","M":"ReceiveNoMatch","A":[{"companyId":32,"companydisplayname":"NEXT Forsikring A/S","message":"Kan ikke matche dine behov"}]},{"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":[{"offers":[{"companyId":19,"companydisplayname":"Gjensidige Forsikring A/S","produktId":123,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":94.88828947368421,"stars":10,"basicprice":5215,"discountedprice":4707,"selvrisiko":7100,"addtionalOptions":[{"sequence":1,"chosen":true,"name":"Kasko"},{"sequence":2,"chosen":false,"name":"Friskade"},{"sequence":3,"chosen":false,"name":"Udvidet glasdækning"},{"sequence":4,"chosen":false,"name":"Førerdækning"},{"sequence":5,"chosen":false,"name":"Vejhjælp"},{"sequence":6,"chosen":true,"name":"Fastpris"}]}],"basicprice":5215,"discountedprice":4707}]},{"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":[{"offers":[{"companyId":13,"companydisplayname":"Runa Forsikring","produktId":155,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":92.388289473684239,"stars":10,"basicprice":999999,"discountedprice":3877,"selvrisiko":6110,"addtionalOptions":[{"sequence":1,"chosen":true,"name":"Kasko"},{"sequence":2,"chosen":false,"name":"Friskade"},{"sequence":3,"chosen":true,"name":"Udvidet glasdækning"},{"sequence":4,"chosen":false,"name":"Førerdækning"},{"sequence":5,"chosen":false,"name":"Vejhjælp"},{"sequence":6,"chosen":true,"name":"Fastpris"}]}],"basicprice":999999,"discountedprice":3877}]},{"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":[{"offers":[{"companyId":34,"companydisplayname":"PenSam","produktId":308,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":91.266184210526319,"stars":10,"basicprice":4691,"discountedprice":4691,"selvrisiko":5493,"addtionalOptions":[{"sequence":1,"chosen":true,"name":"Kasko"},{"sequence":2,"chosen":false,"name":"Friskade"},{"sequence":3,"chosen":false,"name":"Udvidet glasdækning"},{"sequence":4,"chosen":true,"name":"Førerdækning"},{"sequence":5,"chosen":false,"name":"Vejhjælp"},{"sequence":6,"chosen":false,"name":"Fastpris"}]}],"basicprice":4691,"discountedprice":4691}]},{"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":[{"offers":[{"companyId":11,"companydisplayname":"Lærerstandens Brandforsikring","produktId":153,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":92.388289473684239,"stars":10,"basicprice":999999,"discountedprice":3877,"selvrisiko":6110,"addtionalOptions":[{"sequence":1,"chosen":true,"name":"Kasko"},{"sequence":2,"chosen":false,"name":"Friskade"},{"sequence":3,"chosen":true,"name":"Udvidet glasdækning"},{"sequence":4,"chosen":false,"name":"Førerdækning"},{"sequence":5,"chosen":false,"name":"Vejhjælp"},{"sequence":6,"chosen":true,"name":"Fastpris"}]}],"basicprice":999999,"discountedprice":3877}]},{"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":[{"offers":[{"companyId":12,"companydisplayname":"Bauta Forsikring","produktId":154,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":92.388289473684239,"stars":10,"basicprice":999999,"discountedprice":3877,"selvrisiko":6110,"addtionalOptions":[{"sequence":1,"chosen":true,"name":"Kasko"},{"sequence":2,"chosen":false,"name":"Friskade"},{"sequence":3,"chosen":true,"name":"Udvidet glasdækning"},{"sequence":4,"chosen":false,"name":"Førerdækning"},{"sequence":5,"chosen":false,"name":"Vejhjælp"},{"sequence":6,"chosen":true,"name":"Fastpris"}]}],"basicprice":999999,"discountedprice":3877}]},{"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":[{"offers":[{"companyId":25,"companydisplayname":"Alka Forsikring","produktId":130,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":90.658684210526332,"stars":10,"basicprice":6151,"discountedprice":6151,"selvrisiko":6000,"addtionalOptions":[{"sequence":1,"chosen":true,"name":"Kasko"},{"sequence":2,"chosen":false,"name":"Friskade"},{"sequence":3,"chosen":false,"name":"Udvidet glasdækning"},{"sequence":4,"chosen":false,"name":"Førerdækning"},{"sequence":5,"chosen":false,"name":"Vejhjælp"},{"sequence":6,"chosen":true,"name":"Fastpris"}]}],"basicprice":6151,"discountedprice":6151}]},{"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":[{"offers":[{"companyId":24,"companydisplayname":"Tryg","produktId":252,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":95.687631578947389,"stars":10,"basicprice":7324,"discountedprice":5884,"selvrisiko":5833,"addtionalOptions":[{"sequence":1,"chosen":true,"name":"Kasko"},{"sequence":2,"chosen":false,"name":"Friskade"},{"sequence":3,"chosen":false,"name":"Udvidet glasdækning"},{"sequence":4,"chosen":false,"name":"Førerdækning"},{"sequence":5,"chosen":false,"name":"Vejhjælp"},{"sequence":6,"chosen":true,"name":"Fastpris"}]}],"basicprice":7324,"discountedprice":5884}]},{"H":"InsuranceOfferRequestHub","M":"ReceiveOffer","A":[{"offers":[{"companyId":44,"companydisplayname":"FDM Forsikring","produktId":365,"produktType":"BilMedKasko","resultid":"56dadf55-357d-4372-aacf-6377e3b092de","coveragequality":95.687631578947389,"stars":10,"basicprice":4227,"discountedprice":4227,"selvrisiko":5833,"addtionalOptions":[{"sequence":1,"chosen":true,"name":"Kasko"},{"sequence":2,"chosen":false,"name":"Friskade"},{"sequence":3,"chosen":false,"name":"Udvidet glasdækning"},{"sequence":4,"chosen":false,"name":"Førerdækning"},{"sequence":5,"chosen":false,"name":"Vejhjælp"},{"sequence":6,"chosen":true,"name":"Fastpris"}]}],"basicprice":4227,"discountedprice":4227}]}]}

更新:也尝试了以下功能:

def get_total_items(url):
    soup = BeautifulSoup(requests.get(url, headers={"User-Agent":header}).text, 'lxml')
    blacklist= ["companyId", "basicprice", "discountprice", "selvrisiko"]
    text_ele = [t.text for t in soup if t.name in blacklist]
    return text_ele  

print(get_total_items(url))

没有用。

python-3.x web-scraping beautifulsoup
1个回答
0
投票

将结果存储为JSON对象格式并进行解析,您将其存储为字符串格式。

result = json.loads(get_total_items(url))

如何访问JSON对象中的元素(这是JSON对象解析的示例,您需要根据JSON对象添加循环条件)

result['M'][0]['A'][0]['companydisplayname']

我建议您在python中使用requestsjson模块。

© www.soinside.com 2019 - 2024. All rights reserved.