我包含数据集“区域”
House_No. Info_On_Area
1a Names of neighbouringhouse in 100m 1b 1c 1d 1e
1a Area of neighbouringhouse in 100m 500 1000 1500 300
1a Names of neighbouringhouse in 300m 1b 1c 1d 1e 1f 1g 1h
1a Area of neighbouringhouse in 300m 500 1000 1500 300 600 400 2000
2a Names of neighbouringhouse in 100m 2b 2c 2d 2e
2a Area of neighbouringhouse in 100m 500 1000 1500 300
2a Names of neighbouringhouse in 300m 2b 2c 2d 2e 2f 2g 2h
2a Area of neighbouringhouse in 300m 500 1000 1500 300 600 400 2000
我想创建一个数据框,使表格显示为
House_No. Area of neighbouringhouse in 100m Area of neighbouringhouse in 300m
我使用dplyr并分组了不同的门牌号CT %group_by(House_No.
)),并尝试使用rowSums。但是,我得到一个错误,指出信息不是数字。我认为这是因为我需要将行值中的数字设置为数字,但是我不确定该怎么做。我在此阶段陷入困境,无法继续进行下去。
[我确实研究过类似的解决方案,但它们似乎没有一个数据框,它们正在努力求和诸如Sum rows in data.frame or matrix,Sum by Rows in R之类的行值。
我将不胜感激!谢谢:)
使用stringr::str_extract_*
检索数字然后执行spread
library(tidyverse)
df %>%
#extract everything up to 1+ digits followed by m
mutate(flag=str_extract(Info_On_Area,'.*\\d+m'),
#extract any 1 or more digits followed by space or at the end
SumArea=map_dbl(Info_On_Area, ~sum(as.numeric(str_extract_all(.x, '\\d+(?=\\s|$)', simplify = TRUE))))) %>%
filter(str_detect(Info_On_Area, 'Area')) %>%
spread(flag, SumArea)
# A tibble: 4 x 4
House_No. Info_On_Area `Area of neighbouringhouse in 10~ `Area of neighbouringhouse in 30~
<chr> <chr> <dbl> <dbl>
1 1a Area of neighbouringhouse in 100m 500 1000 1500 300 3300 NA
2 1a Area of neighbouringhouse in 300m 500 1000 1500 300 600 400~ NA 6300
3 2a Area of neighbouringhouse in 100m 500 1000 1500 300 3300 NA
4 2a Area of neighbouringhouse in 300m 500 1000 1500 300 600 400~ NA 6300
数据
df <- structure(list(House_No. = c("1a", "1a", "1a", "1a", "2a", "2a",
"2a", "2a"), Info_On_Area = c("Names of neighbouringhouse in 100m 1b 1c 1d 1e",
"Area of neighbouringhouse in 100m 500 1000 1500 300", "Names of neighbouringhouse in 300m 1b 1c 1d 1e 1f 1g 1h",
"Area of neighbouringhouse in 300m 500 1000 1500 300 600 400 2000",
"Names of neighbouringhouse in 100m 2b 2c 2d 2e", "Area of neighbouringhouse in 100m 500 1000 1500 300",
"Names of neighbouringhouse in 300m 2b 2c 2d 2e 2f 2g 2h",
"Area of neighbouringhouse in 300m 500 1000 1500 300 600 400 2000"
)), class = "data.frame", row.names = c(NA, -8L))