若何获得正在某个类之前的HTML元素？

Question

我正在搜刮，难以获得 "th "标签的元素，它在包含 "type2 "类的另一个 "th "元素之前。我更倾向于通过识别它是包含 "type2 "类的 "th "之前的元素 "th"，因为我的HTML有很多 "th"，这是我发现的表格之间唯一的区别。

使用rvest或xml2(或其他R包)，我可以得到这个母体吗？我想要的内容是 "text_that_I_want"。

谢谢你！我在用rvest或xml2(或其他R包)，可以得到这个父体吗？我想要的内容是 "text_that_I_want"。

<tr>
    <th class="array">text_that_I_want</th>
    <td class="array">
        <table>
            <thead>
                <tr>
                    <th class="string type2">name</th>
                    <th class="array type2">answers</th>
                </tr>
            </thead>

Answer 1

相对于给定节点，正式的、更通用的导航xpath的方式是通过 ancestor preceding-sibling:

read_html(htmldoc) %>% 
html_nodes(xpath = "//th[@class = 'string type2']/ancestor::td/preceding-sibling::th") %>% 
html_text()
#> [1] "text_that_I_want"

Answer 2

我们可以在所有的 "type2 "字符串中寻找。<th>s，得到第一次出现的索引，再减去1，得到我们想要的索引。

library(dplyr)
library(rvest)

location <- test%>% 
  html_nodes('th') %>% 
  str_detect("type2")

index_want <- min(which(location == TRUE) - 1)

test%>% 
  html_nodes('th') %>%
  .[[index_want]] %>% 
  html_text()

[1] "text_that_I_want"

若何获得正在某个类之前的HTML元素？

问题描述投票：0回答：1

1个回答

最新问题

若何获得正在某个类之前的HTML元素？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1