我试图使用python脚本和lxml从网站上抓取天气数据。风速数据将被拉出并附加到列表中以供稍后操作。因此,我能够在格式化时获得我需要的信息:
<div class = "day-fcst">
<div class = "wind">
<div class = "gust">
"Gusts to 20-30mph"
</div>
</div>
</div>
但是,当存在低风时,网站会在“gust”div下添加一个子跨度类,如下所示:
<div class = "gust">
<span class = "nowind">
"Gusts less than 20mph"
</span
</div>
我的思考过程是检查span是否存在,如果为true则执行XPath表达式以在span下拉文本,否则执行XPath表达式只是为了在“gust”div下拉文本。我尝试搜索使用XPath布尔函数的示例,但无法使任何工作(在Safari的Web Inspector或我的脚本中都没有)。
我当前的代码使用Python来检查span类是否等同于“nowind”,然后执行if和else语句,但只执行else语句。我当前的代码如下所示:
from lxml import html
import requests
wind = []
source=requests.get('website')
tree = html.fromstring(source.content)
if tree.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]/span/@class') == 'nowind':
wind.append(tree.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]/span/text()'))
else:
wind.append(tree.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]/text()'))
print wind
我想用一个XPath表达式来解决这个问题,这个表达式产生一个布尔值而不是我当前的解决方法。任何帮助,将不胜感激。我仍然是使用XPath的新手,所以我不熟悉它的任何功能。
两种情况都可以使用相同的xpath表达式。只需使用//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]//text()
或者你可以获得<div class = "wind">
元素,而不是使用text_content()
方法来获取文本内容。
In [1]: from lxml import html
In [2]: first_html = '<div class = "day-fcst"><div class = "wind"><div class = "gust">"Gusts to 20-30mph"</div></div></div>'
In [3]: second_html = '<div class = "day-fcst"><div class = "wind"><div class = "gust"><span class = "nowind">"Gusts to 20-30mph"</span></div></div></div>'
In [4]: f = html.fromstring(first_html)
In [5]: s = html.fromstring(second_html)
In [6]: f.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]')[0].text_content()
Out[6]: '"Gusts to 20-30mph"'
In [7]: s.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]')[0].text_content()
Out[7]: '"Gusts to 20-30mph"'
In [8]: print(f.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]//text()'))
['"Gusts to 20-30mph"']
In [9]: print(s.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]//text()'))
['"Gusts to 20-30mph"']