尝试从下表中删除数据
性别统计.html
<table class="default_table">
<thead>
<tr>
<th align="center" valign="middle" style="border-top:solid thin;border-bottom:solid thin" rowspan="1" colspan="1">List</th>
<th align="center" valign="middle" style="border-top:solid thin;border-bottom:solid thin" rowspan="1" colspan="1">Values, n (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="3" align="center" valign="middle" style="border-bottom:solid thin" colspan="1">Gender <br>Male <br>Female </td>
<td align="center" valign="middle" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="center" valign="middle" rowspan="1" colspan="1">75 (74.3)</td>
</tr>
<tr>
<td align="center" valign="middle" style="border-bottom:solid thin" rowspan="1" colspan="1">26 (25.7)</td>
</tr>
</tbody>
</table>
我尝试使用下面的代码,但它返回一个类似于所附屏幕截图的表格。
library(rvest)
tbls <- html_table(read_html("C:/Users/GenderStats.html"))
for (t in 1:length(tbls)) {
assign(paste0("Table", t), tbls[[t]])
}
有没有办法像下面附表那样?
这是一个解决方案,它提取
List
中的每个值,并在来自 tbls
的数据框中为其提供自己的行。然后只需删除带有空 Value
: 的行
library(tidyverse)
tbls[[1]] |>
rownames_to_column() |>
rowwise() |>
mutate(List = str_split_1(List, " ")[[as.numeric(rowname)]]) |>
filter(`Values, n (%)` != "") |>
select(-rowname)
# A tibble: 2 × 2
# Rowwise:
List `Values, n (%)`
<chr> <chr>
1 Male 75 (74.3)
2 Female 26 (25.7)