使用布尔值来使用Python lxml执行不同的XPath表达式

问题描述 投票:0回答:1

我试图使用python脚本和lxml从网站上抓取天气数据。风速数据将被拉出并附加到列表中以供稍后操作。因此,我能够在格式化时获得我需要的信息:

<div class = "day-fcst">
  <div class = "wind">
    <div class = "gust">
      "Gusts to 20-30mph"
    </div>
  </div>
</div>

但是,当存在低风时,网站会在“gust”div下添加一个子跨度类,如下所示:

<div class = "gust">
  <span class = "nowind">
    "Gusts less than 20mph"
  </span
</div>

我的思考过程是检查span是否存在,如果为true则执行XPath表达式以在span下拉文本,否则执行XPath表达式只是为了在“gust”div下拉文本。我尝试搜索使用XPath布尔函数的示例,但无法使任何工作(在Safari的Web Inspector或我的脚本中都没有)。

我当前的代码使用Python来检查span类是否等同于“nowind”,然后执行if和else语句,但只执行else语句。我当前的代码如下所示:

from lxml import html
import requests

wind = []

source=requests.get('website')
tree = html.fromstring(source.content)

if tree.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]/span/@class') == 'nowind':
  wind.append(tree.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]/span/text()'))
else:
  wind.append(tree.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]/text()'))

print wind

我想用一个XPath表达式来解决这个问题,这个表达式产生一个布尔值而不是我当前的解决方法。任何帮助,将不胜感激。我仍然是使用XPath的新手,所以我不熟悉它的任何功能。

python xpath web-scraping lxml boolean-operations
1个回答
0
投票

两种情况都可以使用相同的xpath表达式。只需使用//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]//text()

或者你可以获得<div class = "wind">元素,而不是使用text_content()方法来获取文本内容。

In [1]: from lxml import html

In [2]: first_html = '<div class = "day-fcst"><div class = "wind"><div class = "gust">"Gusts to 20-30mph"</div></div></div>'

In [3]: second_html = '<div class = "day-fcst"><div class = "wind"><div class = "gust"><span class = "nowind">"Gusts to 20-30mph"</span></div></div></div>'

In [4]: f = html.fromstring(first_html)

In [5]: s = html.fromstring(second_html)

In [6]: f.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]')[0].text_content()
Out[6]: '"Gusts to 20-30mph"'

In [7]: s.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]')[0].text_content()
Out[7]: '"Gusts to 20-30mph"'

In [8]: print(f.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]//text()'))
['"Gusts to 20-30mph"']

In [9]: print(s.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]//text()'))
['"Gusts to 20-30mph"']
© www.soinside.com 2019 - 2024. All rights reserved.