我有一个宽数据框。
Year <- c(2020, 2021)
Percent_a <- c(10,10)
Percent_b <- c(12,10)
Percent_c <- c(2,4)
Percent_d <- c(4,5)
df <- data.frame(Year, Percent_a, Percent_b, Percent_c, Percent_d)
我希望我的数据采用以下格式:
Year Item Percent
2020 a 10
2020 b 12
2020 c 2
2020 d 4
2021 a 10
2021 b 10
2021 c 4
2021 d 5
我试过这个:
df %>%
pivot_longer(
cols = -Year,
names_to = c(".value", "Percent"),
names_pattern = "(.)_(.*)",
values_to = "Percentage"
) ->df_longer
它几乎成功了,但我得到了类似的东西 - 什么是“t”?
Year Percent t
2020 a 10
2020 b 12
2020 c 2
2020 d 4
2021 a 10
2021 b 10
2021 c 4
2021 d 5
t
来自Percent末尾的“t”,它是下划线之前的单个字符。因此,您的正则表达式组需要扩展以获得完整的单词“Percent”,而不仅仅是最后一个字符。
尝试:
df %>%
pivot_longer(
cols = -Year,
names_to = c(".value", "Item"),
names_pattern = "(.*)_(.)"
)
输出:
# A tibble: 8 × 3
Year Item Percent
<dbl> <chr> <dbl>
1 2020 a 10
2 2020 b 12
3 2020 c 2
4 2020 d 4
5 2021 a 10
6 2021 b 10
7 2021 c 4
8 2021 d 5