用Selenium刮不同的表

问题描述 投票:0回答:1

目前我正在尝试从以下网站中删除所有价格表:http://aeroportos.weebly.com/fuel-prices.html#.W7SatGj7Sbj

但是,我在尝试在xpath中找到表时遇到了一些问题。另外,我不确定是否可以在一个脚本中删除所有表,或者我必须手动检查它们?

def get_prices():
    url = "http://aeroportos.weebly.com/fuel-prices.html#.W7SM3mj7Sbj"
    driver = webdriver.Firefox()
    driver.implicitly_wait(30)
    driver.get(url)
    rows = driver.find_element_by_xpath('//*[@id="wsite-content"]/div/table/tbody').find_elements_by_tag_name('tr')
    prices = []
    for row in rows:
        cells = row.find_elements_by_tag_name('td')
        country = cells[0].text
        code = cells[1].text
        name = cells[2].text
        price = cells[3].text
        prices.append(region, country, code, name, price)
    print(prices)
python-3.x selenium web-scraping
1个回答
0
投票

答案在于编写正确的xpath,它会在页面中的所有表中获取包含数据(没有标题)的所有行。

以下代码应该运行良好:

def get_prices():
    url = "http://aeroportos.weebly.com/fuel-prices.html#.W7SM3mj7Sbj"
    driver = webdriver.Firefox()
    driver.implicitly_wait(30)
    driver.get(url)
    rows = driver.find_element_by_xpath('//*[contains(text(), "Airport")]/ancestor::tr/following-sibling::tr')
    prices = []
    for row in rows:
        cells = row.find_elements_by_tag_name('td')
        region = cells[0].text
        country = cells[1].text
        code = cells[2].text
        name = cells[3].text
        price = cells[4].text
        prices.append(region, country, code, name, price)
    print(prices)

注意:我没有执行代码但它应该运行良好。谢谢。

© www.soinside.com 2019 - 2024. All rights reserved.