我试图从一个名为Correios的网站获取所有数据,在这个网站上,我需要处理一些下拉菜单,我遇到了一些问题:它返回一个带有一堆空字符串的列表。
chrome_path = r"C:\\Users\\Gustavo\\Desktop\\geckodriver\\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
lista_x = []
driver.get("http://www2.correios.com.br/sistemas/agencias/")
driver.maximize_window()
dropdownEstados = driver.find_elements_by_xpath("""//*[@id="estadoAgencia"]""")
optEstados = driver.find_elements_by_tag_name("option")
for valores in optEstados:
print(valores.text.encode())
我从中得到的是:
b''
b'ACRE'
b'ALAGOAS'
b'AMAP\xc3\x81'
b'AMAZONAS'
b'BAHIA'
b'CEAR\xc3\x81'
b'DISTRITO FEDERAL'
b'ESP\xc3\x8dRITO SANTO'
b'GOI\xc3\x81S'
b'MARANH\xc3\x83O'
b'MINAS GERAIS'
b'MATO GROSSO DO SUL'
b'MATO GROSSO'
b'PAR\xc3\x81'
b'PARA\xc3\x8dBA'
b'PERNAMBUCO'
b'PIAU\xc3\x8d'
b'PARAN\xc3\x81'
b'RIO DE JANEIRO'
b'RIO GRANDE DO NORTE'
b'ROND\xc3\x94NIA'
b'RORAIMA'
b'RIO GRANDE DO SUL'
b'SANTA CATARINA'
b'SERGIPE'
b'S\xc3\x83O PAULO'
b'TOCANTINS'
b''
b''
b''
b''
b''
b''
b''
b''
b''
b''
b''
b''
b''
b''
b''
b''
b''
b''
b''
b''
b''
b''
b''
b''
b''
b''
b''
如何删除空b“”?
如果我理解,你想要找到所有这些选项。
试试这个xPath来定位下拉元素:
//*[@id="estadoAgencia"]/option
代码示例:
chrome_path = r"C:\\Users\\Gustavo\\Desktop\\geckodriver\\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
lista_x = []
driver.get("http://www2.correios.com.br/sistemas/agencias/")
driver.maximize_window()
dropdownEstados = driver.find_elements_by_xpath("//*[@id='estadoAgencia']")
# find elements in dropdown
optEstados = driver.find_elements_by_xpath("//*[@id='estadoAgencia']/option")
for valores in optEstados:
print(valores.text.encode())
通过这个xPath你将得到所有下拉元素,除了一个空字符串,这是在这个下拉列表中。输出:
b''
b'ACRE'
b'ALAGOAS'
b'AMAP\xc3\x81'
b'AMAZONAS'
b'BAHIA'
b'CEAR\xc3\x81'
b'DISTRITO FEDERAL'
b'ESP\xc3\x8dRITO SANTO'
b'GOI\xc3\x81S'
b'MARANH\xc3\x83O'
b'MINAS GERAIS'
b'MATO GROSSO DO SUL'
b'MATO GROSSO'
b'PAR\xc3\x81'
b'PARA\xc3\x8dBA'
b'PERNAMBUCO'
b'PIAU\xc3\x8d'
b'PARAN\xc3\x81'
b'RIO DE JANEIRO'
b'RIO GRANDE DO NORTE'
b'ROND\xc3\x94NIA'
b'RORAIMA'
b'RIO GRANDE DO SUL'
b'SANTA CATARINA'
b'SERGIPE'
b'S\xc3\x83O PAULO'
b'TOCANTINS'
注意:第一个元素是一个空字符串,因为:
您的代码需要进行一些小改动:
dropdownEstados = driver.find_element_by_xpath("""//*[@id="estadoAgencia"]""")
optEstados = dropdownEstados.find_elements_by_tag_name("option")
for valores in optEstados:
print(valores.text.encode())
要从ID为<options>
的DropDown的所有estadoAgencia
中检索文本,因为它是<select>
标记,使用与<select>
标记关联的方法会更容易和有效,您可以使用以下解决方案:
estado_select = Select(driver.find_element_by_id('estadoAgencia'))
for opt in estado_select.options:
print(opt.get_attribute('innerHTML'))
ACRE
ALAGOAS
AMAPÁ
AMAZONAS
BAHIA
CEARÁ
DISTRITO FEDERAL
ESPÍRITO SANTO
GOIÁS
MARANHÃO
MINAS GERAIS
MATO GROSSO DO SUL
MATO GROSSO
PARÁ
PARAÍBA
PERNAMBUCO
PIAUÍ
PARANÁ
RIO DE JANEIRO
RIO GRANDE DO NORTE
RONDÔNIA
RORAIMA
RIO GRANDE DO SUL
SANTA CATARINA
SERGIPE
SÃO PAULO
TOCANTINS