将R中分组数据中具有特定值的行相加

问题描述 投票:0回答:1

我包含数据集“区域”

House_No. Info_On_Area
1a        Names of neighbouringhouse in 100m  1b   1c    1d    1e 
1a        Area of neighbouringhouse  in 100m  500  1000  1500  300
1a        Names of neighbouringhouse in 300m  1b   1c    1d    1e   1f    1g   1h
1a        Area of neighbouringhouse  in 300m  500  1000  1500  300  600   400  2000
2a        Names of neighbouringhouse in 100m  2b   2c    2d    2e 
2a        Area of neighbouringhouse  in 100m  500  1000  1500  300
2a        Names of neighbouringhouse in 300m  2b   2c    2d    2e   2f    2g   2h
2a        Area of neighbouringhouse  in 300m  500  1000  1500  300  600   400  2000

我想创建一个数据框,使表格显示为

House_No. Area of neighbouringhouse in 100m Area of neighbouringhouse  in 300m 

我使用dplyr并分组了不同的门牌号CT %group_by(House_No.)),并尝试使用rowSums。但是,我得到一个错误,指出信息不是数字。我认为这是因为我需要将行值中的数字设置为数字,但是我不确定该怎么做。我在此阶段陷入困境,无法继续进行下去。

[我确实研究过类似的解决方案,但它们似乎没有一个数据框,它们正在努力求和诸如Sum rows in data.frame or matrixSum by Rows in R之类的行值。

我将不胜感激!谢谢:)

r dplyr rows grouped-table
1个回答
1
投票

使用stringr::str_extract_*检索数字然后执行spread

library(tidyverse)
df %>%  
   #extract everything up to 1+ digits followed by m
   mutate(flag=str_extract(Info_On_Area,'.*\\d+m'), 
              #extract any 1 or more digits followed by space or at the end
              SumArea=map_dbl(Info_On_Area, ~sum(as.numeric(str_extract_all(.x, '\\d+(?=\\s|$)', simplify = TRUE))))) %>% 
   filter(str_detect(Info_On_Area, 'Area')) %>% 
   spread(flag, SumArea)

# A tibble: 4 x 4
  House_No. Info_On_Area                                                         `Area of neighbouringhouse  in 10~ `Area of neighbouringhouse  in 30~
  <chr>     <chr>                                                                                             <dbl>                              <dbl>
1 1a        Area of neighbouringhouse  in 100m  500  1000  1500  300                                           3300                                 NA
2 1a        Area of neighbouringhouse  in 300m  500  1000  1500  300  600   400~                                 NA                               6300
3 2a        Area of neighbouringhouse  in 100m  500  1000  1500  300                                           3300                                 NA
4 2a        Area of neighbouringhouse  in 300m  500  1000  1500  300  600   400~                                 NA                               6300

数据

df <- structure(list(House_No. = c("1a", "1a", "1a", "1a", "2a", "2a", 
"2a", "2a"), Info_On_Area = c("Names of neighbouringhouse in 100m  1b   1c    1d    1e", 
"Area of neighbouringhouse  in 100m  500  1000  1500  300", "Names of neighbouringhouse in 300m  1b   1c    1d    1e   1f    1g   1h", 
"Area of neighbouringhouse  in 300m  500  1000  1500  300  600   400  2000", 
"Names of neighbouringhouse in 100m  2b   2c    2d    2e", "Area of neighbouringhouse  in 100m  500  1000  1500  300", 
"Names of neighbouringhouse in 300m  2b   2c    2d    2e   2f    2g   2h", 
"Area of neighbouringhouse  in 300m  500  1000  1500  300  600   400  2000"
)), class = "data.frame", row.names = c(NA, -8L))
© www.soinside.com 2019 - 2024. All rights reserved.