beautifulsoup 相关问题

我正在尝试使用Python中的BeautifulSoup访问图像的SRC。这是图像的嵌套方式：我正在尝试使用 Python 中的 BeautifulSoup 访问图像的 SRC。这是图像的嵌套方式： <div class="artistAndEventInfo-7c13900b"> <a class="artistAndEventInfo-48455a81" href="https://www.bandsintown.com/a/11985-perkele?came_from=257&utm_medium=web&utm_source=artist_event_page&utm_campaign=artist"> <img src="https://assets.bandsintown.com/images/fallbackImage.png" alt=""> </a> 我尝试了三种方法。 1：逻辑是我选择相关图像的父 div，然后选择其中的子 img： image = soup.select_one('[class^=artistAndEventInfo-7c13900b] img') print "band image", image 这将打印“none”。（它应该输出SRC）。 2：使用更明确的第n个类型方法： image = soup.select_one('[class^=artistAndEventInfo-7c13900b] :nth-of-type(1) img') 但是输出仍然是“none”。 3：我也尝试过使用 Selenium： driver.find_element_by_xpath("//div[@class^=artistAndEventInfo-48455a81']") 这给了我错误： selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: Unable to locate an element with the xpath expression //div[@class^=artistAndEventInfo-7c13900b']/img because of the following error: SyntaxError: Failed to execute 'evaluate' on 'Document': The string '//div[@class^=artistAndEventInfo-7c13900b']/img' is not a valid XPath expression. (Session info: chrome=74.0.3729.157) (Driver info: chromedriver=74.0.3729.6 (255758eccf3d244491b8a1317aa76e1ce10d57e9-refs/branch-heads/3729@{#29}),platform=Mac OS X 10.11.6 x86_64) 为什么我的代码在所有这些情况下都不起作用？您的 xpath 看起来有错误 //div[@class^=artistAndEventInfo-7c13900b']/img' 应该是 //div[@class='artistAndEventInfo-7c13900b']/img' 如果你想获取图像的src，那么你应该使用下面的代码和更正后的xpath。 print(driver.find_element_xpath("//div[@class='artistAndEventInfo-7c13900b']//img").get_attribute("src")) 如果您想使用选项 1 和 2，请确保您获得如下属性 src。 print image['src'] 使用 BeautifulSoup，你可以这样做： from bs4 import BeautifulSoup html = ''' <div class="artistAndEventInfo-7c13900b"> <a class="artistAndEventInfo-48455a81" href="https://www.bandsintown.com/a/11985-perkele?came_from=257&utm_medium=web&utm_source=artist_event_page&utm_campaign=artist"> <img src="https://assets.bandsintown.com/images/fallbackImage.png" alt=""> </a> ''' soup = BeautifulSoup(html,'html5lib') img = soup.find('img') src = img['src'] print(src) 您的 div 标签类属性值可能是动态的。您可以尝试下面的方法，而不是使用完整的类属性值。 from bs4 import BeautifulSoup html='''<div class="artistAndEventInfo-7c13900b"> <a class="artistAndEventInfo-48455a81" href="https://www.bandsintown.com/a/11985-perkele?came_from=257&utm_medium=web&utm_source=artist_event_page&utm_campaign=artist"> <img src="https://assets.bandsintown.com/images/fallbackImage.png" alt=""> </a>''' soup=BeautifulSoup(html,'lxml') image = soup.select_one('div[class^=artistAndEventInfo-] img') print(image['src'])

python html selenium-webdriver xpath beautifulsoup

回答 4 投票 0

类型错误：某些关键字参数意外

我正在尝试为页面编写一个解析器。我正在使用 LxmlSoup 库。所以协议是： html = requests.get('https://www.mcdonalds.com/ua/uk-ua/eat/fullmenu.html').text 汤 = LxmlSoup(html) 网址=汤。

python python-3.x parsing beautifulsoup lxml

回答 1 投票 0

为什么抓取的 HTML 与浏览器检查的元素不同？

我目前正在从事一个网络抓取项目，在从 https://Foundersfund.com/portfolio 抓取数据时遇到了问题。我设法检索到每个公司页面的所有链接

python html web-scraping beautifulsoup python-requests

回答 1 投票 0

我使用requests和beautifulsoup来抓取一个网页，为什么我的程序中的html与inspect元素中的不一样？

我目前正在从事一个网络抓取项目，在从 https://Foundersfund.com/portfolio 抓取数据时遇到了问题。我设法检索到每个公司页面的所有链接

python html web-scraping beautifulsoup python-requests

回答 1 投票 0

Python BeautifulSoup - 如何将嵌套元素转换为缩进文本

我想知道是否有人可以帮助我了解如何使用 BeautifulSoup 和 Python 获取网站抓取并将其转换为文本文件。这是来自留言板，人们在那里写自己的文字......

python web-scraping beautifulsoup

回答 1 投票 0

Python BS4 导航多个 HTML 标签

免责声明这是学校作业，不要求完整的解决方案，只是 BS4 部分 “我需要抓取 3 个不同的网站，其中有我相信的研究论文和作者（贡献者）。我需要...

python beautifulsoup

回答 1 投票 0

解析单个对象中的多个Python数据帧

我正在尝试循环浏览网站的多个页面（在本示例中为 2 个页面），抓取相关的客户评论数据，并最终组合成一个数据框架。挑战...

python dataframe for-loop web-scraping beautifulsoup

回答 1 投票 0

使用 BeatifulSoup 进行 Python 网页抓取

我对Python还很陌生，但喜欢学习新东西。我想创建一个 Python 脚本来返回电费，我想将这个值输入到我的家庭自动化系统中（op...

python beautifulsoup

回答 1 投票 0

Errno 2 没有这样的文件或目录“website_content.txt”

我遇到了一个文件未找到的错误，就是标题上的错误，我是初学者。有人可以帮忙吗？导入请求导入时间从 bs4 导入 BeautifulSoup 标题= {“用户代理”：“我的...

python-3.x beautifulsoup python-requests

回答 1 投票 0

使用 Selenium 和/或 BS4 获取 a href 文本值和 h3 文本值

我尝试从此 html 文件获取此处包含的文本值：我尝试从这个 html 文件中获取此处包含的文本值： <div> <div> <div class="py d-flex align-items-start align-items-lg-center css-17435dd row" data-test="salaries-list-item-0" data-brandviews="MODULE:n=salaries-search-salaries-by-company:eid=7927:uid=0:salary_job_title_id=37856" data-triggered-brandview=""><div class="col-12 col-md-3 col-lg-3"><div class="d-flex" data-test="employer-info"><div class="employerLogo mr"><a href="/Salary/Infosys-Project-Manager-United-States-Salaries-EJI_IE7927.0,7_KO8,23_IL.24,37.htm" data-test="salaries-list-item-0-employer-url"><img alt="Infosys" src="https://media.glassdoor.com/sql/7927/infosys-squareLogo-1620208556721.png" class="css-1rc0f2z e1aj7ssy1"></a></div><div class="employerStats"><a href="/Salary/Infosys-Project-Manager-United-States-Salaries-EJI_IE7927.0,7_KO8,23_IL.24,37.htm" data-test="salaries-list-item-0-employer-url" class="css-1ikln7a el6ke052">Infosys</a><div class="d-flex align-items-center mt-xxsm"><span class="css-h9sogr m-0 css-60s9ld el6ke050" color="#0caa41">3.2</span><span data-color-variant="primary" data-size-variant="sm" class="gd-ui-star css-h9sogr ml-xxsm css-ojdxng efdhmtv0" role="presentation"><svg aria-hidden="true" class="css-pq72fl e7xsrz90" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><defs><path id="star_svg__star-multicolor-icon-path" d="m12 2.694 2.119 6.857.109.352h7.294L15.898 14.2l-.277.212.103.333 2.135 6.909-5.556-4.244-.303-.232-.303.232-5.556 4.244 2.135-6.909.103-.333-.277-.212-5.624-4.296h7.295l.108-.352L12 2.694Z"></path><clipPath id="star_svg__star-multicolor-icon-clip"><use href="#star_svg__star-multicolor-icon-path"></use></clipPath></defs><use href="#star_svg__star-multicolor-icon-path" clip-path="url(#star_svg__star-multicolor-icon-clip)" fill="var(--icon-color-a, currentColor)" stroke="var(--icon-color-b, currentColor)" stroke-width="var(--icon-stroke-width, 1px)"></use></svg></span></div><span class="d-flex align-items-middle css-1in2cw4 el6ke050">Project Manager<span class="SVGInline"><svg class="SVGInline-svg" xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24"><path d="M12 9a1 1 0 101 1 1 1 0 00-1-1zm0 3a1 1 0 00-1 1v2a1 1 0 002 0v-2a1 1 0 00-1-1zm0-6a6 6 0 11-6 6 6 6 0 016-6zm0 11a5 5 0 10-5-5 5 5 0 005 5z" fill="currentColor" fill-rule="evenodd"></path></svg></span></span><a class="m-0 d-flex css-259095 e1aj7ssy6" href="/Salary/Infosys-Project-Manager-Salaries-EI_IE7927.0,7_KO8,23.htm" data-test="salaries-list-item-0-salary-count"><span>See 23,491 salaries from this location</span></a><div class="d-block d-lg-none"><a href="/Jobs/Infosys-Jobs-E7927.htm" data-test="salaries-list-item-0-open-jobs-link" class="css-259095 e1aj7ssy6"><span>1,744 open jobs</span></a></div></div></div></div><div class="col-12 col-md-3 col-lg-3 d-none d-lg-block align-self-start"><div class="d-flex" data-test="employer-info"><a href="/Jobs/Infosys-Jobs-E7927.htm" data-test="salaries-list-item-0-open-jobs-link" class="css-259095 e1aj7ssy6"><span>1,744 open jobs</span></a></div></div><div data-test="range-bar" class="col-12 col-lg-6 order-1 order-md-2 order-lg-1 pt-xl pt-lg-0"><div class="d-flex flex-column"><div class="order-0 mt-xl css-1ex4vn2 e1hxjh2q1"><div class="css-1xxwalr e1hxjh2q3"><div class="css-79elbk e13r6qcv0"><div class="mb-xxsm d-flex align-items-baseline css-3tpw1n e13r6qcv1"><h3 class="m-0 css-16zrpia el6ke054">$118,568</h3><span class="m-0 css-1in2cw4 el6ke050"> / yr</span></div></div></div></div><div class="order-2 css-79elbk e1hxjh2q0" style="visibility: visible; min-height: 24px;"><div class="css-osz7jh e13r6qcv4"><div class="css-79elbk e1hxjh2q0"><div class="css-ituc4g e13r6qcv5"><div class="d-flex col css-79elbk e1hxjh2q0"><div class="d-flex flex-column align-items-end css-15o6gsn e13r6qcv5"><span class="m-0 css-1in2cw4 el6ke050">$104K</span></div><div class="d-flex flex-column align-items-end css-1qfy6mj e13r6qcv5"><span class="m-0 css-1in2cw4 el6ke050">$135K</span></div></div></div></div></div></div><div class="order-1 my-xxsm css-psber1 e13r6qcv3"><div class="css-32y0el e13r6qcv2"></div><div class="css-x87ns5 e13r6qcv4"></div></div></div></div></div> 到目前为止，我尝试使用 BS4，但结果值为“无”： source = BeautifulSoup(driver.page_source,'html.parser') data_test_param = 'salaries-list-item-'+str(x)+'-employer-url' a_value = source.find('a', attrs={'data-test': data_test_param}) h3_value = source.find_all('h3',{'class':"m-0 css-16zrpia el6ke054"}) 请您指教一下吗？查看 HTML，您可以执行以下操作： from bs4 import BeautifulSoup html_text = """\ <div> <div> <div class="py d-flex align-items-start align-items-lg-center css-17435dd row" data-test="salaries-list-item-0" data-brandviews="MODULE:n=salaries-search-salaries-by-company:eid=7927:uid=0:salary_job_title_id=37856" data-triggered-brandview=""><div class="col-12 col-md-3 col-lg-3"><div class="d-flex" data-test="employer-info"><div class="employerLogo mr"><a href="/Salary/Infosys-Project-Manager-United-States-Salaries-EJI_IE7927.0,7_KO8,23_IL.24,37.htm" data-test="salaries-list-item-0-employer-url"><img alt="Infosys" src="https://media.glassdoor.com/sql/7927/infosys-squareLogo-1620208556721.png" class="css-1rc0f2z e1aj7ssy1"></a></div><div class="employerStats"><a href="/Salary/Infosys-Project-Manager-United-States-Salaries-EJI_IE7927.0,7_KO8,23_IL.24,37.htm" data-test="salaries-list-item-0-employer-url" class="css-1ikln7a el6ke052">Infosys</a><div class="d-flex align-items-center mt-xxsm"><span class="css-h9sogr m-0 css-60s9ld el6ke050" color="#0caa41">3.2</span><span data-color-variant="primary" data-size-variant="sm" class="gd-ui-star css-h9sogr ml-xxsm css-ojdxng efdhmtv0" role="presentation"><svg aria-hidden="true" class="css-pq72fl e7xsrz90" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><defs><path id="star_svg__star-multicolor-icon-path" d="m12 2.694 2.119 6.857.109.352h7.294L15.898 14.2l-.277.212.103.333 2.135 6.909-5.556-4.244-.303-.232-.303.232-5.556 4.244 2.135-6.909.103-.333-.277-.212-5.624-4.296h7.295l.108-.352L12 2.694Z"></path><clipPath id="star_svg__star-multicolor-icon-clip"><use href="#star_svg__star-multicolor-icon-path"></use></clipPath></defs><use href="#star_svg__star-multicolor-icon-path" clip-path="url(#star_svg__star-multicolor-icon-clip)" fill="var(--icon-color-a, currentColor)" stroke="var(--icon-color-b, currentColor)" stroke-width="var(--icon-stroke-width, 1px)"></use></svg></span></div><span class="d-flex align-items-middle css-1in2cw4 el6ke050">Project Manager<span class="SVGInline"><svg class="SVGInline-svg" xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24"><path d="M12 9a1 1 0 101 1 1 1 0 00-1-1zm0 3a1 1 0 00-1 1v2a1 1 0 002 0v-2a1 1 0 00-1-1zm0-6a6 6 0 11-6 6 6 6 0 016-6zm0 11a5 5 0 10-5-5 5 5 0 005 5z" fill="currentColor" fill-rule="evenodd"></path></svg></span></span><a class="m-0 d-flex css-259095 e1aj7ssy6" href="/Salary/Infosys-Project-Manager-Salaries-EI_IE7927.0,7_KO8,23.htm" data-test="salaries-list-item-0-salary-count"><span>See 23,491 salaries from this location</span></a><div class="d-block d-lg-none"><a href="/Jobs/Infosys-Jobs-E7927.htm" data-test="salaries-list-item-0-open-jobs-link" class="css-259095 e1aj7ssy6"><span>1,744 open jobs</span></a></div></div></div></div><div class="col-12 col-md-3 col-lg-3 d-none d-lg-block align-self-start"><div class="d-flex" data-test="employer-info"><a href="/Jobs/Infosys-Jobs-E7927.htm" data-test="salaries-list-item-0-open-jobs-link" class="css-259095 e1aj7ssy6"><span>1,744 open jobs</span></a></div></div><div data-test="range-bar" class="col-12 col-lg-6 order-1 order-md-2 order-lg-1 pt-xl pt-lg-0"><div class="d-flex flex-column"><div class="order-0 mt-xl css-1ex4vn2 e1hxjh2q1"><div class="css-1xxwalr e1hxjh2q3"><div class="css-79elbk e13r6qcv0"><div class="mb-xxsm d-flex align-items-baseline css-3tpw1n e13r6qcv1"><h3 class="m-0 css-16zrpia el6ke054">$118,568</h3><span class="m-0 css-1in2cw4 el6ke050"> / yr</span></div></div></div></div><div class="order-2 css-79elbk e1hxjh2q0" style="visibility: visible; min-height: 24px;"><div class="css-osz7jh e13r6qcv4"><div class="css-79elbk e1hxjh2q0"><div class="css-ituc4g e13r6qcv5"><div class="d-flex col css-79elbk e1hxjh2q0"><div class="d-flex flex-column align-items-end css-15o6gsn e13r6qcv5"><span class="m-0 css-1in2cw4 el6ke050">$104K</span></div><div class="d-flex flex-column align-items-end css-1qfy6mj e13r6qcv5"><span class="m-0 css-1in2cw4 el6ke050">$135K</span></div></div></div></div></div></div><div class="order-1 my-xxsm css-psber1 e13r6qcv3"><div class="css-32y0el e13r6qcv2"></div><div class="css-x87ns5 e13r6qcv4"></div></div></div></div></div>""" soup = BeautifulSoup(html_text, "html.parser") for h3 in soup.select("h3"): url = h3.find_previous("a")["href"] amount = h3.text print(url) print(amount) print() 打印： /Jobs/Infosys-Jobs-E7927.htm $118,568

python selenium-webdriver beautifulsoup

回答 1 投票 0

抓取 Google 搜索结果 Python BeautifulSoup

我有一个谷歌查询，它显示了8000个带链接的结果，我只想抓取搜索结果中的链接（url），我能够获取第一页链接，有没有什么方法可以抓取下一页。他...

python web-scraping beautifulsoup google-search-api

回答 1 投票 0

Python 导入某些库时出现“非法指令（核心转储）”（beautifulsoup4.yfinance）

我在 Ubuntu 22.04.4 上使用 Python3.10，并且尝试运行我最初在 Windows 11 计算机上编写的代码。每当我运行这个脚本 - main.py - 它总是停止在导入阶段，并且......

python python-3.x ubuntu beautifulsoup yfinance

回答 1 投票 0

在嵌套跨度下的跨度中抓取信息

我想通过网络抓取获取实时天气数据。我正在考虑使用 BeautifulSoup 来实现这一点。我想通过网络抓取获取实时天气数据。我正在考虑使用 BeautifulSoup 来实现此目的。 <span class="Column--precip--3JCDO"> <span class="Accessibility--visuallyHidden--H7O4p">Chance of Rain</span> 3% </span> 我想从这个容器中取出 3%。我已经设法使用此代码片段从网站获取另一部分的数据。 temp_value = soup.find("span", {"class":"CurrentConditions--tempValue--MHmYY"}).get_text(strip=True) 我对 rain_forecast 也做了同样的尝试 rain_forecast = soup.find("span", {"class": "Column--precip--3JCDO"}).get_text(strip=True) 但是我的控制台提供的输出是：“--” for print(rain_forecast)。我能看到的唯一区别是，在应该从跨度获取的“文本”之间，还有另一个跨度。我遇到 stackoverflow 的另一种方法是使用 Selenium，因为数据尚未加载到变量中，因此输出为“--”。但我不知道这对我的应用程序来说是否太过分了，或者是否有更简单的解决方案来解决这个问题。如果您想获取今天的天气预报表，您可以使用此示例： import pandas as pd import requests from bs4 import BeautifulSoup headers = {"User-Agent": "Mozilla/5.0"} url = "https://weather.com/en-IN/weather/today/l/a0e0a5a98f7825e44d5b44b26d6f3c2e76a8d70e0426d099bff73e764af3087a" soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser") today_forecast = [] for a in soup.select(".TodayWeatherCard--TableWrapper--globn a"): today_forecast.append( t.get_text(strip=True, separator=" ") for t in a.find_all(recursive=False) ) df = pd.DataFrame( today_forecast, columns=["Time of day", "Degrees", "Text", "Chance of rain"] ) print(df) 打印： Time of day Degrees Text Chance of rain 0 Morning 11 ° Partly Cloudy -- 1 Afternoon 20 ° Partly Cloudy -- 2 Evening 14 ° Partly Cloudy Night Rain Chance of Rain 3% 3 Overnight 10 ° Cloudy Rain Chance of Rain 5% from bs4 import BeautifulSoup # Assuming you have your HTML content in 'html_content' soup = BeautifulSoup(html_content, 'html.parser') # Find the parent span and extract the text, excluding the nested span's text rain_forecast = soup.find("span", {"class": "Column--precip--3JCDO"}).contents[-1].strip() print(rain_forecast)

python web-scraping beautifulsoup

回答 2 投票 0

beautifulsoup 相关问题

最新问题