目前我正在尝试从以下网站中删除所有价格表:http://aeroportos.weebly.com/fuel-prices.html#.W7SatGj7Sbj
但是,我在尝试在xpath中找到表时遇到了一些问题。另外,我不确定是否可以在一个脚本中删除所有表,或者我必须手动检查它们?
def get_prices():
url = "http://aeroportos.weebly.com/fuel-prices.html#.W7SM3mj7Sbj"
driver = webdriver.Firefox()
driver.implicitly_wait(30)
driver.get(url)
rows = driver.find_element_by_xpath('//*[@id="wsite-content"]/div/table/tbody').find_elements_by_tag_name('tr')
prices = []
for row in rows:
cells = row.find_elements_by_tag_name('td')
country = cells[0].text
code = cells[1].text
name = cells[2].text
price = cells[3].text
prices.append(region, country, code, name, price)
print(prices)
答案在于编写正确的xpath,它会在页面中的所有表中获取包含数据(没有标题)的所有行。
以下代码应该运行良好:
def get_prices():
url = "http://aeroportos.weebly.com/fuel-prices.html#.W7SM3mj7Sbj"
driver = webdriver.Firefox()
driver.implicitly_wait(30)
driver.get(url)
rows = driver.find_element_by_xpath('//*[contains(text(), "Airport")]/ancestor::tr/following-sibling::tr')
prices = []
for row in rows:
cells = row.find_elements_by_tag_name('td')
region = cells[0].text
country = cells[1].text
code = cells[2].text
name = cells[3].text
price = cells[4].text
prices.append(region, country, code, name, price)
print(prices)
注意:我没有执行代码但它应该运行良好。谢谢。