如何使用Python抓取交互式网页

问题描述 投票:0回答:1

我想知道如何抓取以下网站:http://chonos.ifop.cl/flow/

网页右侧有一个地图,当您单击 Highcharts 图表中左侧时间序列上显示的每个点时,我想迭代地提取这些序列,但我仍然不能。这是我到目前为止的代码:

from io import BytesIO
import gzip
site_url='http://chonos.ifop.cl/flow/'
r = urllib.request.urlopen(site_url)
site_content = r.read()
s = BeautifulSoup(site_content, 'html.parser')
print(s.prettify()[:100])
s.find_all('td')
s.find_all('table')
s.findAll('table',attrs={'class':'uk-table uk-table-small uk-table-striped'})
python web-scraping beautifulsoup
1个回答
1
投票

当我在 Firefox/Chrome 中使用

DevTools
(选项卡:
Network
)查看浏览器在单击地图时发送到服务器的所有请求时,我会看到像下面这样的 url,它提供了一些 JSON 数据并且有名称
series 
.

您可以点击此链接直接在浏览器中查看JSON数据

http://chonos.ifop.cl/flow/mapclick?&REQUEST=GetFeatureInfo&SERVICE=WMS&SRS=EPSG%3A4326&STYLES=&TRANSPARENT=true&VERSION=1.1.1&FORMAT=image%2Fpng&BBOX=-84.48486328125%2C-50.16282433381728 %2C-59.54589843750001%2C -45.75219336063107&HEIGHT=300&WIDTH=1135&LAYERS=aguadulce%3Aoutlet_points&QUERY_LAYERS=aguadulce%3Aoutlet_points&INFO_FORMAT=text%2Fhtml&LAT=-46.528634695271684&LON=-71.4111328 1250001&X=595&Y=51


我也可以在代码中使用此链接

import requests

url = 'http://chonos.ifop.cl/flow/mapclick'

params = {
    'REQUEST': 'GetFeatureInfo',
    'SERVICE': 'WMS',
    'SRS': 'EPSG:4326',
    'STYLES': '',
    'TRANSPARENT': 'true',
    'VERSION': '1.1.1',
    'FORMAT': 'image.png',
    'BBOX': '-84.48486328125,-50.16282433381728,-59.54589843750001,-45.75219336063107',
    'HEIGHT': '300',
    'WIDTH': '1135',
    'LAYERS': 'aguadulce:outlet_points',
    'QUERY_LAYERS': 'aguadulce:outlet_points',
    'INFO_FORMAT': 'text.html',
    'LAT': '-46.528634695271684',
    'LON': '-71.41113281250001',
    'X': '595',
    'Y': '51',
}

response = requests.get(url, params=params)

data = response.json()

for item in data['series']['sim']:
    print(item)

结果:

[283996800000, 985.352]
[284083200000, 1115.734]
[284169600000, 1099.139]
[284256000000, 1146.895]
[284342400000, 1127.501]
[284428800000, 1146.251]
[284515200000, 1048.681]
[284601600000, 939.899]
[284688000000, 941.33]
[284774400000, 905.143]

...

在链接中我看到

LAT=
LON=
- 所以如果您要更改纬度,经度`那么您应该获取其他位置的数据。


编辑:

正如@Modammed所说 - 当您单击特殊位置时,它会从类似的链接加载数据

https://chonos.ifop.cl/flow/stnclick?index=50

您可以像之前的链接一样使用此链接。

如果你改变

index
那么你会得到不同的位置。

import requests
    
url = 'http://chonos.ifop.cl/flow/stnclick'

params = {
    'index': 0
}

for number in range(10):
    params['index'] = number
    
    response = requests.get(url, params=params)

    data = response.json()

    print('---', data['name'], '---')
    
    
    #for item in data['series']['sim'][:5]: # show first 5 values
    for item in data['series']['sim']:      # show all values 
        print(item)

结果(每个位置的前 5 个值):

--- Rio Caleta En Tierra Del Fuego ---
[283996800000, 4.41]
[284083200000, 4.27]
[284169600000, 4.13]
[284256000000, 4.0]
[284342400000, 3.95]

--- Rio La Plata Antes Junta Rio Hueyusca ---
[283996800000, 4.43]
[284083200000, 4.15]
[284169600000, 3.88]
[284256000000, 3.63]
[284342400000, 3.39]

--- Rio Hueyusca En Camarones ---
[283996800000, 12.46]
[284083200000, 11.71]
[284169600000, 11.0]
[284256000000, 10.33]
[284342400000, 9.7]

--- Rio Negro En Las Lomas ---
[283996800000, 9.97]
[284083200000, 8.98]
[284169600000, 8.08]
[284256000000, 7.3]
[284342400000, 6.61]

--- Rio Maullin En Las Quemas ---
[283996800000, 35.37]
[284083200000, 33.34]
[284169600000, 31.53]
[284256000000, 29.8]
[284342400000, 28.47]
© www.soinside.com 2019 - 2024. All rights reserved.