我正在使用Selenium python抓取网页。我想跳过表中的前两个TR元素,因为它们是标题和标题。 Selenium中有一种方法或pythonic方法可以跳过前两个TR元素吗?
我已经尝试过使用要开始的TR的特定x路径,但是它并不能仅将特定的TR拉入所有TR。
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
import statistics
import requests
import json
import numpy as np
import statistics
import pandas as pd
import xlsxwriter
browser = webdriver.Chrome("/ProgramData/chocolatey/bin/chromedriver.exe")
browser.get(
"http://rotoguru1.com/cgi-bin/hyday.pl?mon=10&day=22&year=2019&game=fd")
table_rows = browser.find_element_by_xpath(
'/html/body/table/tbody/tr/td[3]/table[4]').find_element_by_tag_name('tbody').find_elements_by_tag_name('tr')
players = []
for row in table_rows:
cells = row.find_elements_by_tag_name('td')
pos = cells[0].text
print(pos)
name = cells[1].text
print(name)
fpts = cells[2].text
salary = cells[3].text
team = cells[4].text
opp = cells[5].text
minutes = cells[7].text
players.append([pos, name, fpts, salary, team, opp, minutes])
df = pd.DataFrame(players, columns=[
"Position", "Name", "FPTS", "Salary", "Team", "Opponent", "Minutes"])
writer = pd.ExcelWriter('NBA_Stats', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
df.style.set_properties(**{'text-align': 'center'})
pd.set_option('display.max_colwidth', 100)
pd.set_option('display.width', 1000)
print(players)
writer.save()
为了跳过前两行,只需将for循环更改为:
for r, row in enumerate(table_rows):
if r < 2:
continue
并保持其余部分不变
您能否检查以下xpath是否对您有用?
//body//table[4]/tbody//tr[not(position()=1)][not(position()=1)]
find_elements_by_tag_name()
返回一个列表,因此您可以对其执行任何常规列表操作。例如,您可以对列表进行切片:
for row in table_rows[2:]:
这将跳过前两行。