如何使用 Selenium 打印 Python 中 Web 表格列中的所有文本?

问题描述 投票:0回答:2

我尝试在 Python 中使用 for 循环,使用列中所有单元格的 XPath 表达式打印出 Web 表格列中的文本。 XPath 表达式与此类似:

//*[@id="webTable"]/tbody/tr[2]/td[6]

我使用的for循环是这样写的:

for x in range(totalRows):
    y = driver.find_elements(by = By.XPATH, value = '//*[@id="webTable"]/tbody/tr[' + str(x) + ']/td[6]')
    print(y)

但是,当我运行该程序时,这是我得到的输出:

[<selenium.webdriver.remote.webelement.WebElement (session="e8afe17e1e80e6c09dd2656800326654", element="c488195e-8751-43c8-9d01-6e873cb2cc4a")>]
[<selenium.webdriver.remote.webelement.WebElement (session="e8afe17e1e80e6c09dd2656800326654", element="70f9ad39-4bdd-4bcf-b869-c31968de4492")>]
[<selenium.webdriver.remote.webelement.WebElement (session="e8afe17e1e80e6c09dd2656800326654", element="f8fd427e-2bd3-4995-8b24-7cb7bda14f1a")>]
[<selenium.webdriver.remote.webelement.WebElement (session="e8afe17e1e80e6c09dd2656800326654", element="0541eb71-24a1-44e9-bb9d-bacc63426bad")>]
[<selenium.webdriver.remote.webelement.WebElement (session="e8afe17e1e80e6c09dd2656800326654", element="b19a839e-a6c1-43f2-bcf1-1f0692ff2c0f")>]
[<selenium.webdriver.remote.webelement.WebElement (session="e8afe17e1e80e6c09dd2656800326654", element="b427383a-31a5-49f8-a466-62fb5a489047")>]
[<selenium.webdriver.remote.webelement.WebElement (session="e8afe17e1e80e6c09dd2656800326654", element="1cd4bd3f-6e7f-4a89-950e-0f5dab47eabd")>]
[<selenium.webdriver.remote.webelement.WebElement (session="e8afe17e1e80e6c09dd2656800326654", element="5c964e47-2fff-4c4d-9743-eecbd1c7bea6")>]
[<selenium.webdriver.remote.webelement.WebElement (session="e8afe17e1e80e6c09dd2656800326654", element="54ff1ef7-0693-43e2-939e-c387f8f20e06")>]
[<selenium.webdriver.remote.webelement.WebElement (session="e8afe17e1e80e6c09dd2656800326654", element="21a63bd7-7dc5-4860-bfb2-1309a842c2f7")>]
[<selenium.webdriver.remote.webelement.WebElement (session="e8afe17e1e80e6c09dd2656800326654", element="aee78709-f4ee-4e0f-8cb7-6c3114b52fba")>]
[<selenium.webdriver.remote.webelement.WebElement (session="e8afe17e1e80e6c09dd2656800326654", element="28ef515e-4c66-472b-8126-76793eeebee2")>]
[<selenium.webdriver.remote.webelement.WebElement (session="e8afe17e1e80e6c09dd2656800326654", element="2fb995ff-9100-4124-9efe-f8c2bfe49767")>]

我尝试像这样编写 for 循环:

for x in range(totalRows):
    y = driver.find_elements(by = By.XPATH, value = '//*[@id="webTable"]/tbody/tr[' + str(x) + ']/td[6]')
    print(y.text)

和:

for x in range(totalRows):
    y = driver.find_elements(by = By.XPATH, value = '//*[@id="webTable"]/tbody/tr[' + str(x) + ']/td[6]').text
    print(y)

但是当我这样写时,我收到此错误:

AttributeError: 'list' object has no attribute 'text'

我还能如何提取单元格内的文本?

python selenium-webdriver web-scraping xpath
2个回答
1
投票

解决方案如下:

table = driver.find_element(by=By.XPATH, value='//*[@id="webTable"]/tbody')
rows = table.find_elements(by=By.TAG_NAME, value="tr")

# column to choose by its index, say 2nd column in the table
desired_column = 1 
desired_column_data = []

for row in rows:
    columns = row.find_elements(by=By.TAG_NAME, value='td')

    for index, col in enumerate(columns):
        if index == desired_column:
            desired_column_data.append(col.text)

print(desired_column_data)

希望,有帮助:)


1
投票

driver.findElements 返回 WebElements 列表。因此,在尝试获取 List 对象的文本值时,您会收到预期的错误。相反,您应该在迭代 TotalRows 时在逻辑中使用 driver.findElement

for x in range(totalRows):
    y = driver.find_element(by = By.XPATH, value = '//*[@id="webTable"]/tbody/tr[' + str(x) + ']/td[6]').text
    print(y)
© www.soinside.com 2019 - 2024. All rights reserved.