我有一个宽格式数据框,我希望通过缩短列来使其变长,但无法弄清楚如何让pivot_longer()和paste()正常协同工作。
这是一个示例数据集:
df <- data.frame(Site_Number = c(1, 1, 1, 2),
Subsite_Number = c(1, 2, 3, 1),
C1 = c("Red", "Blue", "Green", "Red"),
C2 = c("Red", "Red", "Red", "Red"),
C3 = c("Blue", "Green", "NA", "Blue"),
C4 = c("Red", "NA", "NA", "NA"))
看起来像:
Site_Number Subsite_Number C1 C2 C3 C4
1 1 Red Red Blue Red
1 2 Blue Red Green NA
1 3 Green Red NA NA
2 1 Red Red Blue NA
我想根据 Site_Number、Subsite_Number 和“C”值(列名称)创建一个唯一 ID,将数据帧设置为长格式,但保留 C 列中的值。诀窍在于,某些“C”列中存在 NA,无需将其转换为新数据集。这是我希望的输出:
ID Color
S1T1C1 Red
S1T1C2 Red
S1T1C3 Blue
S1T1C4 Red
S1T2C1 Blue
S1T2C2 Red
S1T2C3 Green
S1T3C1 Green
S1T3C2 Red
S2T1C1 Red
S2T1C2 Red
S2T1C3 Blue
S = Site_Number,T = Subsite_Number,C 来自 C1:4 的列名来创建 ID 号。
关于如何正确操作它有什么想法吗?
df |>
mutate(ID = paste0("S", Site_Number, "T", Subsite_Number)) |>
select(-Site_Number, -Subsite_Number) |>
pivot_longer(-ID) |>
filter(value != "NA" & !is.na(value)) |>
mutate(ID = paste0(ID, name))
# # A tibble: 12 × 3
# ID name value
# <chr> <chr> <chr>
# 1 S1T1C1 C1 Red
# 2 S1T1C2 C2 Red
# 3 S1T1C3 C3 Blue
# 4 S1T1C4 C4 Red
# 5 S1T2C1 C1 Blue
# 6 S1T2C2 C2 Red
# 7 S1T2C3 C3 Green
# 8 S1T3C1 C1 Green
# 9 S1T3C2 C2 Red
# 10 S2T1C1 C1 Red
# 11 S2T1C2 C2 Red
# 12 S2T1C3 C3 Blue