鉴于我的第一行是年份,如何使用 R 转换数据。我想创建一个有多年的新专栏。
我无法使用数据透视函数,因为我的数据没有存储在另一列中,而是存储在第一行中。此外,当我导入数据集时,20 个变量会变成大约 260 个变量 a(唯一的列 ID 使一个变量变成 15 个不同年份的 15 个变量)
这就是我的数据的样子:
tibble(
ID = c(NA, 1,2,3,4),
Region = c(NA, "Region A", "Region B", "Region C", "Region D"),
Variable1 = c(2017, 1, 2, 3, 4),
Variable1.1 = c(2018, 5, 6, 7, 8),
Variable1.2 = c(2019, 9, 10, 11, 12),
Variable2 = c(2019, 13, 14, 15, 16),
Variable2.1 = c(2020, 17, 18, 19, 20)
)
这就是我希望我的数据的样子:
tibble(
ID = c(1,1,1,1,2,2),
Region = c("Region A","Region A","Region A","Region A", "Region B","Region B"),
Year = c(2017, 2018, 2019, 2020,2017,2018),
Variable1 = c(1,5,9,NA,2,6),
Variable2 = c(NA,NA,13,17,NA,NA)
)
如果您从名为
dd
的 data.frame 开始
dd <- tibble(
ID = c(NA, 1,2,3,4),
Region = c(NA, "Region A", "Region B", "Region C", "Region D"),
Variable1 = c(2017, 1, 2, 3, 4),
Variable1.1 = c(2018, 5, 6, 7, 8),
Variable1.2 = c(2019, 9, 10, 11, 12),
Variable2 = c(2019, 13, 14, 15, 16),
Variable2.1 = c(2020, 17, 18, 19, 20)
)
您需要将其转换为更“正常”的 data.frame,其中数据位于标题中,而不是行中。
names(dd)[-(1:2)]<-paste(gsub("\\..*$","",names(dd)[-(1:2)]), as.character(dd[1,-(1:2)]), sep="_")
dd <- dd[-1,]
head(dd)
# ID Region Variable1_2017 Variable1_2018 Variable1_2019 Variable2_2019 Variable2_2020
# <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 Region A 1 5 9 13 17
# 2 2 Region B 2 6 10 14 18
# 3 3 Region C 3 7 11 15 19
# 4 4 Region D 4 8 12 16 20
然后你可以做一个标准的枢轴
tidyr::pivot_longer(dd, -c(ID, Region), names_sep="_", names_to=c(".value", "year"))
# ID Region year Variable1 Variable2
# <dbl> <chr> <chr> <dbl> <dbl>
# 1 1 Region A 2017 1 NA
# 2 1 Region A 2018 5 NA
# 3 1 Region A 2019 9 13
# 4 1 Region A 2020 NA 17
# ...
unheadr
包中有一个用于此目的的函数。
dd |>
unheadr::mash_colnames(1,keep_names = FALSE)
结果
# A tibble: 4 × 7
`NA` `NA` `2017` `2018` `2019` `2019` `2020`
<dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 Region A 1 5 9 13 17
2 2 Region B 2 6 10 14 18
3 3 Region C 3 7 11 15 19
4 4 Region D 4 8 12 16 20
Warning message:
In unheadr::mash_colnames(dd, 1, keep_names = FALSE) :
possible NA values in variable names, check the `n_name_rows` argument