如何从通过selenium和python提交数据后刷新的网页中抓取数据?

问题描述 投票:0回答:2

我正在用python和selenium开发一个geolocation web-scrapper。当我输入数据in this website时,页面刷新(使用相同的URL),当我尝试从纬度和经度输入中获取数据时,它不打印任何内容。

Here's the sample output, it returns an empty string

我注意到输入数据后value标签发生了变化

<input id="place" name="place" type="text" placeholder="Type a place name" class="width70" style="text-transform:capitalize;" value="" required="">

我应该操纵吗?谢谢 :)

这是我的代码:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time

counter = 0

locations = [

    'Republic of the Philippines',
    'Heaven',
    'Philippines',
]

latitude = []
longtitude = []

browser = webdriver.Chrome('C://Users/user1/Portable Python 3.7.0     x64/App/Python/Lib/site-packages/chromedriver')

url = 'https://www.latlong.net/'

for i in locations:

    browser.get(url)
    bar = browser.find_element_by_id('place')
    bar.send_keys(i)
    bar.send_keys(Keys.ENTER)
    time.sleep(3)
    lat = browser.find_element_by_id('lat')
    lng = browser.find_element_by_id('lng')

    time.sleep(3)

    latitude.append(lat.text)
    longtitude.append(lng.text)

    print(latitude[counter])
    print(longtitude[counter])

    counter+=1

    browser.refresh()
python selenium web-scraping refresh
2个回答
0
投票

问题是,如果在发送Keys.ENTER后检查元素,则无法读取文本。它以某种方式使用不同的技术来取代“占位符”

<div class="col-6 m2">
   <label for="lat">Latitude</label>
   <input type="text" name="lat" id="lat" placeholder="lat coordinate">
</div>

另一方面,你可以做的是,找到元素id“latlngspan”。它位于地图下方,并且有两个参数 - lat和long,您可以对其执行一些简单的字符串操作以获得所需的格式。

这对你有用吗?


0
投票

你可以做一个POST请求

import requests
from bs4 import BeautifulSoup as bs
import re

url = 'https://www.latlong.net/'
locations = ['Republic of the Philippines', 'Heaven', 'Philippines']
latitude = []
longitude = []

with requests.Session() as sess:

    for i in locations: 
        r = sess.get(url)
        soup = bs(r.content, 'lxml')
        token = soup.select_one('#lltoken')['value']
        data = { 'place': i, 'lltoken': token }
        r = sess.post(url, data = data)
        s = r.text

        try:
            lat_lon = re.findall( r'sm\((-?\d+\.\d+),(-?\d+\.\d+)', s)[0]
            lat = lat_lon[0]
            lon = lat_lon[1]
            latitude.append(lat)
            longitude.append(lon)
        except:
            print(s)

print(latitude)
print(longitude)

硒:

你可以从地图iframe的src中获取它们。似乎没有必要等待条件,但您可能需要考虑添加这些条件(或者我很乐意添加以向您展示)

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import re

locations = [  
    'Republic of the Philippines',
    'Heaven',
    'Philippines',
]

latitude = []
longitude = []

url = 'https://www.latlong.net/'

browser = webdriver.Chrome()
browser.get(url)

for i in locations:
    bar = browser.find_element_by_id('place')
    bar.clear()
    bar.send_keys(i)
    bar.send_keys(Keys.ENTER)
    s = browser.find_element_by_id('latlongmape').get_attribute('src')
    lat_lon = re.findall( r'(-?\d+\.\d+)', s)
    lat = lat_lon[0]
    lon = lat_lon[1]
    latitude.append(lat)
    longitude.append(lon)

print(latitude)
print(longitude)
browser.quit()

等待条件使用不同的元素来源:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import re
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

locations = [

    'Republic of the Philippines',
    'Heaven',
    'Philippines',
]

latitude = []
longitude = []

url = 'https://www.latlong.net/'

browser = webdriver.Chrome()
browser.get(url)

for i in locations:
    bar = WebDriverWait(browser,5).until(EC.presence_of_element_located((By.ID, "place")))
    bar.clear()
    bar.send_keys(i)
    bar.send_keys(Keys.ENTER)
    s = WebDriverWait(browser,5).until(EC.presence_of_element_located((By.ID, "coordinateslink"))).text
    lat_lon = re.findall( r'(-?\d+\.\d+)', s)
    lat = lat_lon[0]
    lon = lat_lon[1]
    latitude.append(lat)
    longitude.append(lon)

print(latitude)
print(longitude)
browser.quit()

您还可以使用javascript返回值:

lat = browser.execute_script("return document.getElementById('lat').value;")
lon = browser.execute_script("return document.getElementById('lng').value;")

您还可以从其中一个脚本标记中的正则表中进行正则表达式:

lat_lon = re.findall( r'sm\((-?\d+\.\d+),(-?\d+\.\d+)', browser.page_source)[0]
lat = lat_lon[0]
lon = lat_lon[1]
print(lat, lon)

找到值的地方:

您可以在具有以下js的脚本中查看javascript分配坐标值的所有不同位置:

<script>
var mymap = L.map('latlongmap');
var mmr = L.marker([0,0]);
mmr.bindPopup('0,0');
mmr.addTo(mymap);
L.tileLayer('https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png?{foo}', {foo: 'bar',
attribution:'&copy; <a href="https://www.openstreetmap.org/copyright">OpenStreetMap</a>'}).addTo(mymap);

mymap.on('click', onMapClick);

sm(14.693390,121.067238,12)
function isll(num) {
var val = parseFloat(num);
if (!isNaN(val) && val <= 90 && val >= -90)
    return true;
else
    return false;
}

function onMapClick(e) {
mmr.setLatLng(e.latlng);
setui(e.latlng.lat,e.latlng.lng,mymap.getZoom());
}

function dec2dms(e,t) {
document.getElementById("dms-lat").innerHTML = getdms(e, !0), document.getElementById("dms-lng").innerHTML = getdms(t, !1)
}
function getdms(e, t) {
var n = 0, m = 0, l = 0, a = "X";
return a = t && 0 > e ? "S" : !t && 0 > e ? "W" : t ? "N" : "E", d = Math.abs(e), n = Math.floor(d), l = 3600 * (d - n), m = Math.floor(l / 60), l = Math.round(1e4 * (l - 60 * m)) / 1e4, n + "&deg; " + m + "' " + l + "'' " + a
}

function sm(lt,ln,zm) {
    setui(lt,ln,zm);
    mmr.setLatLng(L.latLng(lt,ln));
    mymap.setView([lt,ln], zm);
}

function setui(lt,ln,zm) {
    lt = Number(lt).toFixed(6);
    ln = Number(ln).toFixed(6);
mmr.setPopupContent(lt + ',' + ln).openPopup();
document.getElementById("lat").value=lt;
document.getElementById("lng").value=ln;
document.getElementById("latlngspan").innerHTML ="(" + lt + ", " + ln + ")"; 
document.getElementById("coordinatesurl").value = "https://www.latlong.net/c/?lat=" + lt + "&long=" + ln;
document.getElementById("coordinateslink").innerHTML = '&lt;a href="https://www.latlong.net/c/?lat=' + lt + "&amp;long=" + ln + '" target="_blank"&gt;(' + lt + ", " + ln + ")&lt;/a&gt;";
dec2dms(lt,ln);
document.getElementById('latlongmape').src='https://www.google.com/maps/embed/v1/view?key=AIzaSyALrSTy6NpqdhIOUs3IQMfvjh71td2suzY&maptype=satellite&'+'center='+lt+','+ ln+'&zoom='+zm;
}
       
</script>
© www.soinside.com 2019 - 2024. All rights reserved.