我是新来的网络刮遇到了一些麻烦从网页中获取数据。
我想读此网页:https://www.timeanddate.com/weather/pakistan/lahore/historic?month=7&year=2018
并试图通过与一流的div元素来获取风速数据:wstext,但由于某些原因,请求库页面获得通过互联网不包含此特定的类和它的一些祖先。
import requests
import bs4 as bs
import numpy as np
wind = np.random.rand(120)
dailyWindRecord = np.random.rand(30,4)
html = requests.get('https://www.timeanddate.com/weather/pakistan/lahore/historic?month=7&year=2018')
print(html.text)
soup = bs.BeautifulSoup(html.content, 'html5lib')
print(soup.prettify)
windList = soup.findAll('div')
print(windList)
我试着打印的HTML数据请求直接读取和分析它通过beautifulsoup看到如果HTML数据包含在类,但我无法找到任何东西之后。任何帮助将不胜感激。
大熊猫能为你做的工作,而不是使用BS4或请求:
import numpy as np
import pandas as pd
wind = np.random.rand(120)
dailyWindRecord = np.random.rand(30,4)
url = 'https://www.timeanddate.com/weather/pakistan/lahore/historic?month=7&year=2018'
tables = pd.read_html(url)
table = tables[1]
print (table.iloc[:,4])
输出:
print (table.iloc[:,4])
0 3 mph
1 No wind
2 No wind
3 No wind
4 No wind
5 No wind
6 No wind
7 3 mph
8 5 mph
9 6 mph
10 5 mph
11 5 mph
12 6 mph
13 5 mph
14 No wind
15 3 mph
16 No wind
17 No wind
18 No wind
19 No wind
20 5 mph
21 No wind
22 6 mph
23 6 mph
24 5 mph
25 6 mph
26 7 mph
27 7 mph
28 7 mph
29 3 mph
30 3 mph
31 3 mph
32 3 mph
33 No wind
34 3 mph
35 3 mph
36 No wind
37 No wind
38 NaN
Name: (Unnamed: 4_level_0, Wind), dtype: object
选项2:
你可以找到和在HTML拉JSON结构,然后与工作。当我尝试,虽然,它具有数据延续了月供,而不是单一的一天,按小时:
import numpy as np
import requests
import bs4
import json
wind = np.random.rand(120)
dailyWindRecord = np.random.rand(30,4)
url = 'https://www.timeanddate.com/weather/pakistan/lahore/historic?month=7&year=2018'
response = requests.get(url)
soup = bs4.BeautifulSoup(response.text, 'html.parser')
scripts = soup.find_all('script')
jsonObj = None
for script in scripts:
if 'var data=' in script.text:
jsonStr = script.text.strip()
jsonStr = jsonStr.split('var data=')[1]
jsonStr = jsonStr.split(';')[0]
jsonObj = json.loads(jsonStr)
for item in jsonObj['detail']:
date = item['ds']
wind = item['wind']
print ('Date: %-40s Wind: %s' %(date,wind) )
输出:
Date: Sunday, 1 July 2018, 00:00 — 06:00 Wind: 0.621
Date: Sunday, 1 July 2018, 06:00 — 12:00 Wind: 3.728
Date: Sunday, 1 July 2018, 12:00 — 18:00 Wind: 3.107
Date: Sunday, 1 July 2018, 18:00 — 00:00 Wind: 3.107
Date: Monday, 2 July 2018, 00:00 — 06:00 Wind: 1.864
Date: Monday, 2 July 2018, 06:00 — 12:00 Wind: 5.593
Date: Monday, 2 July 2018, 12:00 — 18:00 Wind: 8.7
Date: Monday, 2 July 2018, 18:00 — 00:00 Wind: 9.943
Date: Tuesday, 3 July 2018, 00:00 — 06:00 Wind: 10.564
Date: Tuesday, 3 July 2018, 06:00 — 12:00 Wind: 11.185
Date: Tuesday, 3 July 2018, 12:00 — 18:00 Wind: 9.943
Date: Tuesday, 3 July 2018, 18:00 — 00:00 Wind: 6.214
Date: Wednesday, 4 July 2018, 00:00 — 06:00 Wind: 6.836
Date: Wednesday, 4 July 2018, 06:00 — 12:00 Wind: 4.971
Date: Wednesday, 4 July 2018, 12:00 — 18:00 Wind: 6.214
Date: Wednesday, 4 July 2018, 18:00 — 00:00 Wind: 3.728
Date: Thursday, 5 July 2018, 00:00 — 06:00 Wind: 1.864
Date: Thursday, 5 July 2018, 06:00 — 12:00 Wind: 1.864
Date: Thursday, 5 July 2018, 12:00 — 18:00 Wind: 3.107
Date: Thursday, 5 July 2018, 18:00 — 00:00 Wind: 3.107
Date: Friday, 6 July 2018, 00:00 — 06:00 Wind: 1.864
Date: Friday, 6 July 2018, 06:00 — 12:00 Wind: 6.214
Date: Friday, 6 July 2018, 12:00 — 18:00 Wind: 6.836
Date: Friday, 6 July 2018, 18:00 — 00:00 Wind: 3.728
Date: Saturday, 7 July 2018, 00:00 — 06:00 Wind: 1.243
Date: Saturday, 7 July 2018, 06:00 — 12:00 Wind: 2.486
Date: Saturday, 7 July 2018, 12:00 — 18:00 Wind: 6.836
Date: Saturday, 7 July 2018, 18:00 — 00:00 Wind: 2.486
Date: Sunday, 8 July 2018, 00:00 — 06:00 Wind: 3.107
Date: Sunday, 8 July 2018, 06:00 — 12:00 Wind: 6.214
Date: Sunday, 8 July 2018, 12:00 — 18:00 Wind: 5.593
Date: Sunday, 8 July 2018, 18:00 — 00:00 Wind: 4.35
Date: Monday, 9 July 2018, 00:00 — 06:00 Wind: 5.593
Date: Monday, 9 July 2018, 06:00 — 12:00 Wind: 5.593
Date: Monday, 9 July 2018, 12:00 — 18:00 Wind: 6.214
Date: Monday, 9 July 2018, 18:00 — 00:00 Wind: 4.35
Date: Tuesday, 10 July 2018, 00:00 — 06:00 Wind: 6.836
Date: Tuesday, 10 July 2018, 06:00 — 12:00 Wind: 8.078
Date: Tuesday, 10 July 2018, 12:00 — 18:00 Wind: 6.836
Date: Tuesday, 10 July 2018, 18:00 — 00:00 Wind: 5.593
Date: Wednesday, 11 July 2018, 00:00 — 06:00 Wind: 6.214
Date: Wednesday, 11 July 2018, 06:00 — 12:00 Wind: 12.428
Date: Wednesday, 11 July 2018, 12:00 — 18:00 Wind: 8.078
Date: Wednesday, 11 July 2018, 18:00 — 00:00 Wind: 5.593
Date: Thursday, 12 July 2018, 00:00 — 06:00 Wind: 4.971
Date: Thursday, 12 July 2018, 06:00 — 12:00 Wind: 8.078
Date: Thursday, 12 July 2018, 12:00 — 18:00 Wind: 7.457
Date: Thursday, 12 July 2018, 18:00 — 00:00 Wind: 6.214
Date: Friday, 13 July 2018, 00:00 — 06:00 Wind: 5.593
Date: Friday, 13 July 2018, 06:00 — 12:00 Wind: 11.807
Date: Friday, 13 July 2018, 12:00 — 18:00 Wind: 9.321
Date: Friday, 13 July 2018, 18:00 — 00:00 Wind: 5.593
Date: Saturday, 14 July 2018, 00:00 — 06:00 Wind: 4.971
Date: Saturday, 14 July 2018, 06:00 — 12:00 Wind: 4.971
Date: Saturday, 14 July 2018, 12:00 — 18:00 Wind: 6.214
Date: Saturday, 14 July 2018, 18:00 — 00:00 Wind: 6.214
Date: Sunday, 15 July 2018, 00:00 — 06:00 Wind: 8.7
Date: Sunday, 15 July 2018, 06:00 — 12:00 Wind: 8.7
Date: Sunday, 15 July 2018, 12:00 — 18:00 Wind: 8.7
Date: Sunday, 15 July 2018, 18:00 — 00:00 Wind: 5.593
Date: Monday, 16 July 2018, 00:00 — 06:00 Wind: 4.971
Date: Monday, 16 July 2018, 06:00 — 12:00 Wind: 11.185
Date: Monday, 16 July 2018, 12:00 — 18:00 Wind: 11.185
Date: Monday, 16 July 2018, 18:00 — 00:00 Wind: 8.7
Date: Tuesday, 17 July 2018, 00:00 — 06:00 Wind: 7.457
Date: Tuesday, 17 July 2018, 06:00 — 12:00 Wind: 8.078
Date: Tuesday, 17 July 2018, 12:00 — 18:00 Wind: 6.836
Date: Tuesday, 17 July 2018, 18:00 — 00:00 Wind: 4.971
Date: Wednesday, 18 July 2018, 00:00 — 06:00 Wind: 3.728
Date: Wednesday, 18 July 2018, 06:00 — 12:00 Wind: 2.486
Date: Wednesday, 18 July 2018, 12:00 — 18:00 Wind: 6.214
Date: Wednesday, 18 July 2018, 18:00 — 00:00 Wind: 4.971
Date: Thursday, 19 July 2018, 00:00 — 06:00 Wind: 4.971
Date: Thursday, 19 July 2018, 06:00 — 12:00 Wind: 5.593
Date: Thursday, 19 July 2018, 12:00 — 18:00 Wind: 6.214
Date: Thursday, 19 July 2018, 18:00 — 00:00 Wind: 1.864
Date: Friday, 20 July 2018, 00:00 — 06:00 Wind: 2.486
Date: Friday, 20 July 2018, 06:00 — 12:00 Wind: 5.593
Date: Friday, 20 July 2018, 12:00 — 18:00 Wind: 8.078
Date: Friday, 20 July 2018, 18:00 — 00:00 Wind: 3.728
Date: Saturday, 21 July 2018, 00:00 — 06:00 Wind: 0.621
Date: Saturday, 21 July 2018, 06:00 — 12:00 Wind: 1.243
Date: Saturday, 21 July 2018, 12:00 — 18:00 Wind: 2.486
Date: Saturday, 21 July 2018, 18:00 — 00:00 Wind: 7.457
Date: Sunday, 22 July 2018, 00:00 — 06:00 Wind: 4.971
Date: Sunday, 22 July 2018, 06:00 — 12:00 Wind: 6.836
Date: Sunday, 22 July 2018, 12:00 — 18:00 Wind: 4.35
Date: Sunday, 22 July 2018, 18:00 — 00:00 Wind: 4.35
Date: Monday, 23 July 2018, 00:00 — 06:00 Wind: 2.486
Date: Monday, 23 July 2018, 06:00 — 12:00 Wind: 6.214
Date: Monday, 23 July 2018, 12:00 — 18:00 Wind: 6.836
Date: Monday, 23 July 2018, 18:00 — 00:00 Wind: 4.971
Date: Tuesday, 24 July 2018, 00:00 — 06:00 Wind: 3.107
Date: Tuesday, 24 July 2018, 06:00 — 12:00 Wind: 7.457
Date: Tuesday, 24 July 2018, 12:00 — 18:00 Wind: 4.35
Date: Tuesday, 24 July 2018, 18:00 — 00:00 Wind: 2.486
Date: Wednesday, 25 July 2018, 00:00 — 06:00 Wind: 1.243
Date: Wednesday, 25 July 2018, 06:00 — 12:00 Wind: 3.728
Date: Wednesday, 25 July 2018, 12:00 — 18:00 Wind: 6.836
Date: Wednesday, 25 July 2018, 18:00 — 00:00 Wind: 7.457
Date: Thursday, 26 July 2018, 00:00 — 06:00 Wind: 7.457
Date: Thursday, 26 July 2018, 06:00 — 12:00 Wind: 9.321
Date: Thursday, 26 July 2018, 12:00 — 18:00 Wind: 11.185
Date: Thursday, 26 July 2018, 18:00 — 00:00 Wind: 7.457
Date: Friday, 27 July 2018, 00:00 — 06:00 Wind: 6.836
Date: Friday, 27 July 2018, 06:00 — 12:00 Wind: 5.593
Date: Friday, 27 July 2018, 12:00 — 18:00 Wind: 4.35
Date: Friday, 27 July 2018, 18:00 — 00:00 Wind: 4.35
Date: Saturday, 28 July 2018, 00:00 — 06:00 Wind: 3.728
Date: Saturday, 28 July 2018, 06:00 — 12:00 Wind: 6.214
Date: Saturday, 28 July 2018, 12:00 — 18:00 Wind: 1.864
Date: Saturday, 28 July 2018, 18:00 — 00:00 Wind: 3.728
Date: Sunday, 29 July 2018, 00:00 — 06:00 Wind: 3.107
Date: Sunday, 29 July 2018, 06:00 — 12:00 Wind: 6.836
Date: Sunday, 29 July 2018, 12:00 — 18:00 Wind: 5.593
Date: Sunday, 29 July 2018, 18:00 — 00:00 Wind: 2.486
Date: Monday, 30 July 2018, 00:00 — 06:00 Wind: 1.864
Date: Monday, 30 July 2018, 06:00 — 12:00 Wind: 3.728
Date: Monday, 30 July 2018, 12:00 — 18:00 Wind: 4.971
Date: Monday, 30 July 2018, 18:00 — 00:00 Wind: 2.486
Date: Tuesday, 31 July 2018, 00:00 — 06:00 Wind: 1.243
Date: Tuesday, 31 July 2018, 06:00 — 12:00 Wind: 6.836
Date: Tuesday, 31 July 2018, 12:00 — 18:00 Wind: 6.836
Date: Tuesday, 31 July 2018, 18:00 — 00:00 Wind: 3.107
下面是JSON格式的击穿去wind
我的探索和极很肮脏“之类的解决方案”的问题
看看pandas solution - 它工作得很好。
看看pandas source - 我们看到大熊猫正在使用_BeautifulSoupHtml5LibFrameParser
。
人机工程学:BeautifulSoup是罚款。
让我们尝试curl:
$ curl https://www.timeanddate.com/weather/pakistan/lahore/historic\?month\=7\&year\=2018 > result.html
$ less result.html
我们在这里看到的:
</script><script type="text/javascript">
var data={"copyright":"Contents are strictly for use by
timeanddate.com","units":
{"temp":"°C","prec":"mm","wind":"km\/h","baro":"mbar"},
"temp":
[{"date":15304047E5,"temp":29},{"date":15304065E5,"temp":29},
{"date":15304083E5,"temp":29},{"date":15304101E5,"temp":28},
...
我想这是OP寻找数据。
curl
/ wget
/ requests
- 一切都必须是精var data
。 Python的str
的方法必须足够json.loads
这种提取data
美容在这样的解决方案 - 数据卡梅斯as is
不脱离HTML <table>
进行解码。
我个人喜欢qazxsw POI的解决方案。
由于pandas
是伟大的图书馆本身。
但不需要大熊猫来解决这个问题。