我在同一行中打印表格数据时遇到问题。当然,我只能用css_selector("td")
标识,但可以打印出来:名称地址市,州同一列中的电话而我正在尝试创建:名称,地址,城市/州,电话到同一行
HTML :(请参阅附图)
这似乎是一个愚蠢的问题,要挂掉……但是我已经被困了很长时间了,而且还无法隔离<br>
标签。
代码:
for x in link:
driver.get(x)
try:
i = 0
while 0 < 20:
name = driver.find_elements_by_xpath("/html/body/div[2]/div/div[1]/div/div/table/tbody/tr/td[1]/table/tbody/tr['"+str(i)+"']/td/strong")
if name[i].is_displayed():
print(name[i].text)
i = i + 1
else:
i = i + 1
except(NoSuchElementException,JavascriptException, IndexError):
continue
我已经以这种方式确定了这种方法,试图简单地将过程中的兄弟姐妹的文本返回...再次无济于事。driver.find_elements_by_css_selector("td")
还返回整个表数据...但带有中断
<br>
在\n
的文本中添加新行<td>
,您将其拆分或删除
tds = driver.find_elements_by_css_selector("td")
for td in tds:
text = td.text.split('\n')
print(text) # list: ['text1', 'text2', 'text3', 'text4']
text = td.text.replace('\n', ' ')
print(text) # str: 'textr text2 text3 text4'
如果您能够用<td>
css-selectors标识父级css_selector("td")
元素以打印名称,地址,城市/州和电话,则可以使用以下Locator Strategies:
[名称:
print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "td>strong"))).get_attribute("innerHTML"))
地址:
print(driver.execute_script('return arguments[0].childNodes[3].textContent;', WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "td")))).strip())
城市/州:
print(driver.execute_script('return arguments[0].childNodes[5].textContent;', WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "td")))).strip())
电话:
print(driver.execute_script('return arguments[0].lastChild.textContent;', WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "td")))).strip())
BeautifulSoup也可以在这种情况下使用。
>>>from bs4 import beautifulsoup
>>>import requests
>>>contents=requests.get(url).text
>>>soup=beautifulsoup('lxml',contents)
>>>>Text=soup.find('body').text
并检查条件是否存在'br'标签,然后跳过
for x in link:
driver.get(x)
try:
names = driver.find_elements_by_css_selector("td")
i = 0
while i <= len(names):
address = names[i].text.splitlines()
r = len(address)
if r == 4:
print(x, " | ",address[0], " | ", address[1], " | ", address[2], " | ", address[3])
elif r == 3:
print(x, " | ",address[0], " | ", address[1], " | ", address[2])
else:
pass
i=i+1
except(NoSuchElementException, IndexError):
continue
这完成了工作。