第一次在这里问问题,对 R 还很陌生,所以如果我的格式/措辞不正确,请耐心等待。预先感谢您的帮助!
我在数据框中存储了多个列作为因子。但是,在使用管道运行 dplyr 函数的代码块后,tibble 输出不会保留数据帧中的因子级别。数据乱序。
这是因子水平(我删除了前面的一些代码,因为它很长):
...
POST_600_WOBA_GRADE = factor(POST_600_WOBA_GRADE, levels = c("Elite", "Great", "Good", "Average", "Below Average", "Poor", "Awful", "DNQ")))
当我运行此命令进行确认时,会显示正确的级别:
levels(all_batter_career_data_by_ID$POST_600_WOBA_GRADE)
[1] "Elite" "Great" "Good" "Average" "Below Average"
[6] "Poor" "Awful" "DNQ"
但是,当我在下面运行此命令时,DNQ 和 Awful 发生了翻转。 DNQ(未合格)应位于最右侧。
all_batter_career_data_by_ID %>%
filter(BAT_DEBUT < "2015-01-01") %>%
group_by(FIRST_600_WOBA_GRADE, POST_600_WOBA_GRADE) %>%
summarize(n = n()) %>%
pivot_wider(names_from = POST_600_WOBA_GRADE, values_from = n)
# A tibble: 8 × 9
# Groups: FIRST_600_WOBA_GRADE [8]
FIRST_600_WOBA_GRADE Elite Great Good Average `Below Average` Poor DNQ Awful
<fct> <int> <int> <int> <int> <int> <int> <int> <int>
1 Elite 1 1 4 1 NA NA NA NA
2 Great 5 9 14 7 3 1 8 NA
3 Good 1 21 62 56 20 12 44 1
4 Average 1 10 45 77 29 29 78 6
5 Below Average NA 5 24 36 14 26 56 7
6 Poor NA 5 22 35 14 33 94 20
7 Awful NA 1 7 21 9 38 132 22
8 DNQ NA NA NA NA NA NA 1126 NA
我正在为具有相同因子水平的另一个变量运行完全相同的代码,并且小标题的顺序正确,所以我不确定为什么这个被翻转。再次感谢您的帮助!
要了解发生了什么,这里有一些示例数据:
set.seed(0)
df <- data.frame(
"first_grade" = "Elite",
"second_grade" = c("Elite", "Great", "Good", "Average", "Below Average", "Poor", "Awful", "DNQ") |> factor(levels = c("Elite", "Great", "Good", "Average", "Below Average", "Poor", "Awful", "DNQ")),
"n" = sample(1:100, 8, replace = TRUE))
更广泛地旋转这些数据给了我们正确的答案,但是一旦我们对行进行随机排序,我们就会得到错误的答案:
df <- df[sample(1:8, 8),]
tidyr::pivot_wider(df, names_from = "second_grade", values_from = "n")
# A tibble: 1 × 9
first_grade Great Good Awful Elite `Below Average` Average DNQ Poor
<chr> <int> <int> <int> <int> <int> <int> <int> <int>
1 Elite 68 39 43 14 34 1 14 87
正如 Darren 指出的,解决方案是使用
names_sort = TRUE
参数来表示 pivot_wider()
(对您想要更宽的列进行排序也可以):
tidyr::pivot_wider(df, names_from = "second_grade", values_from = "n", names_sort = TRUE)
输出:
# A tibble: 1 × 9
first_grade Elite Great Good Average `Below Average` Poor Awful DNQ
<chr> <int> <int> <int> <int> <int> <int> <int> <int>
1 Elite 14 68 39 1 34 87 43 14