我想在http://buyersguide.recyclingtoday.com/search下面的网站上通过selenium选择联系信息。为了逐个匹配正确的信息,我想首先选择行,然后从行中选择信息。简单的代码如下,我现在的问题是如何从每一行中选择信息。例如,公司名称,电子邮件。
码:
from time import sleep
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException
import pandas as pd
driver = webdriver.Chrome('D:\chromedriver_win32\chromedriver.exe')
driver.get('http://buyersguide.recyclingtoday.com/search')
rows = driver.find_elements_by_xpath('//*[@id="Body_tbl"]/tbody/tr')
for row in rows:
email = row.find_element_by_xpath('//*/tr/td[3]/a').text
company=row.find_element_by_xpath('//*/tr/td[1]').text
运行代码作为下面的答案,但我仍然面临问题?
from time import sleep
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException
import pandas as pd
driver = webdriver.Chrome('D:\chromedriver_win32\chromedriver.exe')
driver.get('http://buyersguide.recyclingtoday.com/search')
rows = driver.find_elements_by_xpath('//*[@id="Body_tbl"]/tbody/tr')
records = []
for row in rows:
company=row.find_element_by_xpath('./td[1]').text
address = row.find_element_by_xpath('./td[2]').text
contact= row.find_element_by_xpath('./td[3]//a').text
number= row.find_element_by_xpath('./td[5]').text
records.append((company,address,contact,number))
df = pd.DataFrame(records, columns=['company','number','address', 'contact'])
没有选择内容
你可以得到像,
您必须在没有表头的表中找到可用的行数,
这是根据您的HTML的示例。
使用Python的示例:
rows = driver.find_elements_by_xpath("//td[@style='font-weight:bold;']//parent::tr")
for row in rows:
company=row.find_element_by_xpath('./td[1]').text
address = row.find_element_by_xpath('./td[2]').text
contact= row.find_element_by_xpath('./td[3]//a').text
number= row.find_element_by_xpath('./td[5]').text
使用Java的示例:
List<WebElement> findData = driver.findElements("//td[@style='font-weight:bold;']//parent::tr");
for (WebElement webElement : findData) {
String getValueofCompany = webElement.findElement(By.xpath("./td[1]")).getText();
String getValueofAddress = webElement.findElement(By.xpath("./td[2]")).getText();
String getValueofContact = webElement.findElement(By.xpath("./td[3]//a")).getText();
String getValueofPhoneNumber = webElement.findElement(By.xpath("./td[5]")).getText();
}
你可以使用这样的东西:
for row in rows:
email = row.find_element_by_xpath('.//td[3]/a').text
company = row.find_element_by_xpath('.//td[1]').text
您想要的数据从哪里开始
tr[3]//td[1]
- 包含公司名称为文本
tr[3]//td[3]
- 包含电子邮件但在href属性中
因此,在tr
上循环从索引3开始到rows
WebElement长度
rows = driver.find_elements_by_xpath('//*[@id="Body_tbl"]/tbody/tr')
for index, element in enumerate(rows,start=2):
companyName = rows.find_element_by_xpath("//tr[" + index + "]//td[1]")
if companyName is not None:
companyName.getText();
companyEmail = driver.find_element_by_xpath("//tr[" + index + "]//td[3]/a")
if companyEmail is not None:
companyEmail.get_attribute("href"); // this will give exact if email is there
注意 - 我无法测试代码,请注意边界条件。谢谢