使用 python playwright 获取 href 链接

Question

我正在尝试提取 href 内的链接，但我发现它只是元素内的文本

网站代码如下：

<div class="item-info-container ">
   <a href="/imovel/32600863/" role="heading" aria-level="2" class="item-link xh-highlight" 
   title="Apartamento T3 na avenida da Liberdade, São José de São Lázaro e São João do Souto, Braga">
   Apartamento T3 na avenida da Liberdade, São José de São Lázaro e São João do Souto, Braga
   </a>

我使用的代码是：

element_handle = page.locator('//div[@class="item-info-container "]//a').all_inner_texts()

无论我是否指定

//a[@href]

，我的输出始终是标题文本：

Apartamento T3 na avenida da Liberdade, São José de São Lázaro e São João do Souto, Braga

当我真正想要实现的是：

/imovel/32600863/

我的逻辑在哪里失败了，有什么想法吗？

Answer 1

使用

get_attribute

：

link = page.locator('.item-info-container ').get_by_role('link').get_attribute('href')

多个定位器：

link_locators = page.locator('.item-info-container ').get_by_role('link').all()
for _ in link_locators:
    print(_.get_attribute('href'))

Answer 2

只需省略

//

并使用以下 XPath-1.0 表达式：

//div[@class="item-info-container "]/a/@href

这将为您提供

@href

属性的值：“/imovel/32600863/”。
整个命令可能是

element_handle = page.locator('//div[@class="item-info-container "]/a/@href').all_inner_texts()

但是表达式的结果不是元素，而是属性，所以这可能会失败。

Answer 3

设法通过查找所有元素，然后在处理所有元素后获取属性来做到这一点。

handleLinks = page.locator('//div[@class="item-info-container "]/a')
    for links in handleLinks.element_handles():
        linkF = links.get_attribute('href')
        print(linkF)

结果将是：

/imovel/32611494/
/imovel/32642523/
/imovel/32633771/
/imovel/32527162/
/imovel/30344934/
/imovel/31221488/
/imovel/32477875/
/imovel/31221480/
/imovel/32450120/
/imovel/32515628/
/imovel/32299064/

使用 python playwright 获取 href 链接

问题描述投票：0回答：3

3个回答

最新问题

使用 python playwright 获取 href 链接

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3