关于您的示例代码并假设您只想最后提取数字,我们可以使用
xpath
参数的解决方法并排除 <svg>
标记内的所有内容,然后 purrr::discard
所有空字符串:
library(rvest)
library(purrr)
html |>
read_html(html) |>
html_elements("p") |>
html_nodes(xpath='//*[not(name()="svg")]/text()') |>
html_text(trim=TRUE) |>
purrr::discard(\(x) x == "")
#> [1] "94 - 100 m²" "3" "3" "2"
来自OP的数据
html <- '<section class="card__amenities ">
<p class="l-text l-u-color-neutral-28 l-text--variant-body-small l-text--weight-regular card__amenity" itemprop="floorSize"><span data-testid="l-icon" role="document" aria-label="Tamanho do imóvel" class="l-icon l-u-color-undefined"><svg viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">...</svg></span> 94 - 100 m² </p>
<p class="l-text l-u-color-neutral-28 l-text--variant-body-small l-text--weight-regular card__amenity" itemprop="numberOfRooms"><span data-testid="l-icon" role="document" aria-label="Quantidade de quartos" class="l-icon l-u-color-undefined"><svg viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">...</svg></span> 3 </p>
<p class="l-text l-u-color-neutral-28 l-text--variant-body-small l-text--weight-regular card__amenity" itemprop="numberOfBathroomsTotal"<span data-testid="l-icon" role="document" aria-label="Quantidade de banheiros" class="l-icon l-u-color-undefined"><svg viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">...</svg></span>3</p>
<p class="l-text l-u-color-neutral-28 l-text--variant-body-small l-text--weight-regular card__amenity"><span data-testid="l-icon" role="document" aria-label="Quantidade de vagas de garagem" class="l-icon l-u-color-undefined"><svg viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg"><...</svg></span>2</p>
</section>'
创建于 2023-09-15,使用 reprex v2.0.2