在Beautifulsoup中查找特定的HTML标签

问题描述 投票:0回答:2

我一直在努力寻找正确的汤。select_one或find_next组合来找到下面的zestimate标签。您能帮忙找到此汤代码吗?

这里是网址:

https://www.zillow.com/homedetails/8612-Silverthorne-St-Austin-TX-78744/251036192_zpid/

我正在尝试返回:$486,997

<div id="home-details-home-values">
   <h2>Home Value</h2>
   <div class="zestimate-summary">
      <div class="zsg-content-component zestimate-above-toggle">
         <div class="primary-zestimate-item">
            <div>
               <div class="title zsg-h3 zsg-content_collapsed"><span tabindex="0" role="button"><span class="ds-dashed-underline">Zestimate</span></span></div>
               <div class="content">
                  <div class="zestimate-value">$486,997</div>
               </div>
            </div>
            <div class="left-spacer"></div>
            <div class="right-spacer"></div>
            <div class="zillow-offers-upsell-wrapper">
               <div class="sc-kgoBCf pnJxW">
                  <div class="zsg-h3 zsg-content_collapsed">Zillow Offer</div>
                  <a href="/offers/?t=omhdp-zestimate&amp;zpid=251036192">Get your Zillow Offer</a>
               </div>
            </div>
         </div>
         <div class="secondary-zestimate-items">
            <div class="zsg-lg-1-3 zsg-md-1-1 secondary-row">
               <span class="zestimate-icon"><img src="data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iNTYiIGhlaWdodD0iNTYiIHZpZXdCb3g9IjAgMCA1NiA1NiIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIiB4bWxuczp4bGluaz0iaHR0cDovL3d3dy53My5vcmcvMTk5OS94bGluayI+PHRpdGxlPlplc3RpbWF0ZV9SYW5nZTwvdGl0bGU+PGRlZnM+PGVsbGlwc2UgaWQ9ImEiIGN4PSIyOCIgY3k9IjI4IiByeD0iMjgiIHJ5PSIyOCIvPjxtYXNrIGlkPSJjIiB4PSIwIiB5PSIwIiB3aWR0aD0iNTYiIGhlaWdodD0iNTYiIGZpbGw9IiNmZmYiPjx1c2UgeGxpbms6aHJlZj0iI2EiLz48L21hc2s+PHBhdGggZD0iTTIzLjgwNCAxMy41MDF2MTAuNTExYzAgLjY0OC0uMzI1IDEuNTEyLTEuNTEzIDEuNTEyaC01Ljk0VjE0Ljc2MmgtNS45NHYxMC43NjJINC40N2MtMS4xODggMC0xLjUxMi0uODY0LTEuNTEyLTEuNTEydi0xMC41MUguNThjLS44NjQgMC0uNjQ4LS40MzMtLjEwOC0xLjA4TDEyLjM1NC40MzFjLjMyNC0uMzI0LjY0OS0uNDMyIDEuMDgtLjQzMi40MzMgMCAuNzU3LjIxNiAxLjA4LjQzMmwxMS44ODIgMTEuOTljLjY0OC42NDcuODY0IDEuMDgtLjEwOCAxLjA4aC0yLjQ4NHoiIGlkPSJiIi8+PG1hc2sgaWQ9ImQiIHg9IjAiIHk9IjAiIHdpZHRoPSIyNi45NSIgaGVpZ2h0PSIyNS41MjQiIGZpbGw9IiNmZmYiPjx1c2UgeGxpbms6aHJlZj0iI2IiLz48L21hc2s+PC9kZWZzPjxnIHN0cm9rZT0iIzAwNzRFNCIgc3Ryb2tlLXdpZHRoPSIyIiBmaWxsPSIjRkZGIiBmaWxsLXJ1bGU9ImV2ZW5vZGQiPjx1c2UgbWFzaz0idXJsKCNjKSIgeGxpbms6aHJlZj0iI2EiLz48dXNlIG1hc2s9InVybCgjZCkiIHhsaW5rOmhyZWY9IiNiIiB0cmFuc2Zvcm09InRyYW5zbGF0ZSgxNSAxNSkiLz48L2c+PC9zdmc+" role="presentation"></span>
               <div class="secondary-wrapper">
                  <div class="title zsg-h4 zsg-content_collapsed"><span tabindex="0" role="button"><span class="ds-dashed-underline">Zestimate Range</span></span></div>
                  <div class="content">$463,000 - $511,000</div>
               </div>
            </div>
            <div class="zsg-lg-1-3 zsg-md-1-1 secondary-row">
               <span class="zestimate-icon"><img src="data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iNTYiIGhlaWdodD0iNTYiIHZpZXdCb3g9IjAgMCA1NiA1NiIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIiB4bWxuczp4bGluaz0iaHR0cDovL3d3dy53My5vcmcvMTk5OS94bGluayI+PHRpdGxlPjMwX0RheXNfRG93bjwvdGl0bGU+PGRlZnM+PGVsbGlwc2UgaWQ9ImEiIGN4PSIyOCIgY3k9IjI4IiByeD0iMjgiIHJ5PSIyOCIvPjxtYXNrIGlkPSJjIiB4PSIwIiB5PSIwIiB3aWR0aD0iNTYiIGhlaWdodD0iNTYiIGZpbGw9IiNmZmYiPjx1c2UgeGxpbms6aHJlZj0iI2EiLz48L21hc2s+PHBhdGggZD0iTTI4LjcwNiAxMy43NjVMMTYuNDcgMS41MjlDMTYgMS4wNiAxNS40MS44MjQgMTQuNzA2LjgyNGMtLjcwNiAwLTEuMjk0LjIzNS0xLjY0Ny43MDVMLjcwNiAxMy43NjVjLS40Ny40Ny0uNzA2IDEuMDU5LS43MDYgMS43NjQgMCAuNzA2LjIzNSAxLjE3Ny43MDYgMS42NDdsMS40MTIgMS40MTJjLjQ3LjQ3IDEuMDU4LjcwNiAxLjY0Ny43MDYuNzA2IDAgMS4yOTQtLjIzNSAxLjY0Ny0uNzA2bDUuNTMtNS41M3YxMy4yOTVjMCAuNzA2LjIzNCAxLjE3Ni43MDUgMS42NDdhMi44OSAyLjg5IDAgMCAwIDEuNzY1LjU4OGgyLjQ3QTIuODkgMi44OSAwIDAgMCAxNy42NDcgMjhjLjQ3LS4zNTMuNzA2LS45NDEuNzA2LTEuNjQ3VjEzLjA1OWw1LjUzIDUuNTNjLjQ3LjQ3IDEuMDU4LjcwNSAxLjY0Ni43MDUuNzA2IDAgMS4yOTUtLjIzNSAxLjc2NS0uNzA2bDEuNDEyLTEuNDEyYy40Ny0uNDcuNzA2LTEuMDU4LjcwNi0xLjY0NyAwLS43MDUtLjIzNi0xLjI5NC0uNzA2LTEuNzY0eiIgaWQ9ImIiLz48bWFzayBpZD0iZCIgeD0iMCIgeT0iMCIgd2lkdGg9IjI5LjQxMiIgaGVpZ2h0PSIyNy43NjUiIGZpbGw9IiNmZmYiPjx1c2UgeGxpbms6aHJlZj0iI2IiLz48L21hc2s+PC9kZWZzPjxnIHN0cm9rZT0iIzAwNzRFNCIgc3Ryb2tlLXdpZHRoPSIyIiBmaWxsPSIjRkZGIiBmaWxsLXJ1bGU9ImV2ZW5vZGQiPjx1c2UgbWFzaz0idXJsKCNjKSIgeGxpbms6aHJlZj0iI2EiLz48dXNlIG1hc2s9InVybCgjZCkiIHhsaW5rOmhyZWY9IiNiIiB0cmFuc2Zvcm09Im1hdHJpeCgxIDAgMCAtMSAxMyA0MykiLz48L2c+PC9zdmc+" role="presentation"></span>
               <div class="secondary-wrapper">
                  <div class="title zsg-h4 zsg-content_collapsed">Last 30 Day Change</div>
                  <div class="content">-$2,830 <span class="percent-decrease">(-0.6 %)</span></div>
               </div>
            </div>
         </div>
      </div>
      <div class="toggle-section">
         <div class="zsg-content-component module-separator hide">
            <div class="additional-zestimate-info zsg-wrapper-body-hidden"></div>
         </div>
         <div class="zsg-content-item"><a class="toggle zsg-lg-1-1 zsg-centered">Zestimate history &amp; details <span class="zsg-icon-expando-down"></span></a></div>
      </div>
   </div>
</div>

这是我正在使用的代码:

req_headers = {
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'en-US,en;q=0.8',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'
}

for link in df['links']:
    r = s.get(link, headers=req_headers)
    soup = BeautifulSoup(r.content, 'html.parser')
    #     soup = BeautifulSoup(requests.get(url, headers=req_headers).content, 'html.parser')
    results = soup.select_one('h4:contains("Home value")').find_next('p').get_text(strip=True)
    print(results)
python html beautifulsoup html-parsing
2个回答
0
投票

我进行了快速测试,似乎页面要求验证验证码是您的人。

import requests
from bs4 import BeautifulSoup

page = requests.get("https://www.zillow.com/homedetails/8612-Silverthorne-St-Austin-TX-78744/251036192_zpid/")

soup = BeautifulSoup(page.content, 'html.parser')

print(soup.text)

>>>Please verify you're a human to continue.

0
投票

基于my answer:似乎Zillow为用户提供了更多类型的页面。首先检查,如果您没有验证码页面。如果没有,请使用此脚本:

import requests
from bs4 import BeautifulSoup


url = 'https://www.zillow.com/homedetails/8612-Silverthorne-St-Austin-TX-78744/251036192_zpid/'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')

home_value = soup.select_one('h4:contains("Home value")')
if not home_value:
    home_value = soup.select_one('.zestimate').text.split()[-1]
else:
    home_value = home_value.find_next('p').get_text(strip=True)

print(home_value)

打印:

$486,997

对于url = 'https://www.zillow.com/homedetails/1404-Clearwing-Cir-Georgetown-TX-78626/121721750_zpid/',它打印:

$324,493

可能需要更多测试。

© www.soinside.com 2019 - 2024. All rights reserved.