对于这个任务,在我的真实数据集中。我有 18 行,其 indcode = 000000 且所有权代码 = 10。区分因素是面积。同样,我有 18 行,其 indcode = 4911 和所有权代码 = 10。为了便于计算,下面的示例数据将其缩小到 4。一些背景..在我的真实数据集中,我有今年(02)的月度数据和从 02 月到 1 月的月份(一月)。 910 是新的索引代码。它代表特定地区和时间的联邦就业总数。联邦就业定义为 indcode = 000000 减去 indcode = 4911。indcode = 55 只是为了让它更现实。
PS,我对“02-Jan”有一些困难,所以请随意将其重命名为 Jan。只是试图使其与真实产品保持一致。
indcode <- c("000000","000000","000000","000000", "55", "4911","4911","4911","4911")
ownership <- c("10","10","10","10","10","10","10","10","10")
area <- c("000000","031","029","017","029","000000","031","029","017")
"02-Jan" <- c(1000,600,300,100,50,100,50,40,10)
"02-Feb" <- c(1003,601,301,101,51,101,51,41,11)
first <- data.frame(indcode, ownership, area, `02-Jan`, `02-Feb`)
因此对于每个区域,这里都有一个示例。实际的 02 值不会是 1000-100 而是 900,但我认为这会让它更清楚。
indcode ownership area 02-Jan 02-Feb
910 10 000000 1000-100 1003-101
910 10 031 600-50 601-51
library(dplyr)
first |>
summarize(across(3:4, ~paste(rev(range(.)), collapse = "-")), .by = area)
#"3:4" refers to the 3rd and 4th column once we set aside the area grouping
# We could alternated specify the columns by name, e.g. X02.Jan:X02.Feb
结果
area X02.Jan X02.Feb
1 000000 1000-100 1003-101
2 031 600-50 601-51
3 029 300-40 301-41
4 017 100-10 101-11