tibble 中的因子级别无序

问题描述 投票:0回答:1

第一次在这里问问题,对 R 还很陌生,所以如果我的格式/措辞不正确,请耐心等待。预先感谢您的帮助!

我在数据框中存储了多个列作为因子。但是,在使用管道运行 dplyr 函数的代码块后,tibble 输出不会保留数据帧中的因子级别。数据乱序。

这是因子水平(我删除了前面的一些代码,因为它很长):

...
POST_600_WOBA_GRADE = factor(POST_600_WOBA_GRADE, levels = c("Elite", "Great", "Good", "Average", "Below Average", "Poor", "Awful", "DNQ")))

当我运行此命令进行确认时,会显示正确的级别:

levels(all_batter_career_data_by_ID$POST_600_WOBA_GRADE)
[1] "Elite"         "Great"         "Good"          "Average"       "Below Average"
[6] "Poor"          "Awful"         "DNQ" 

但是,当我在下面运行此命令时,DNQ 和 Awful 发生了翻转。 DNQ(未合格)应位于最右侧。

all_batter_career_data_by_ID %>%
  filter(BAT_DEBUT < "2015-01-01") %>%
  group_by(FIRST_600_WOBA_GRADE, POST_600_WOBA_GRADE) %>%
  summarize(n = n()) %>%
  pivot_wider(names_from = POST_600_WOBA_GRADE, values_from = n)

# A tibble: 8 × 9
# Groups:   FIRST_600_WOBA_GRADE [8]
  FIRST_600_WOBA_GRADE Elite Great  Good Average `Below Average`  Poor   DNQ Awful
  <fct>                <int> <int> <int>   <int>           <int> <int> <int> <int>
1 Elite                    1     1     4       1              NA    NA    NA    NA
2 Great                    5     9    14       7               3     1     8    NA
3 Good                     1    21    62      56              20    12    44     1
4 Average                  1    10    45      77              29    29    78     6
5 Below Average           NA     5    24      36              14    26    56     7
6 Poor                    NA     5    22      35              14    33    94    20
7 Awful                   NA     1     7      21               9    38   132    22
8 DNQ                     NA    NA    NA      NA              NA    NA  1126    NA

我正在为具有相同因子水平的另一个变量运行完全相同的代码,并且小标题的顺序正确,所以我不确定为什么这个被翻转。再次感谢您的帮助!

r dplyr tibble factors
1个回答
0
投票

要了解发生了什么,这里有一些示例数据:

set.seed(0)

df <- data.frame(
  "first_grade" = "Elite",
  "second_grade" = c("Elite", "Great", "Good", "Average", "Below Average", "Poor", "Awful", "DNQ") |> factor(levels = c("Elite", "Great", "Good", "Average", "Below Average", "Poor", "Awful", "DNQ")),
  "n" = sample(1:100, 8, replace = TRUE))

更广泛地旋转这些数据给了我们正确的答案,但是一旦我们对行进行随机排序,我们就会得到错误的答案:

df <- df[sample(1:8, 8),]  

tidyr::pivot_wider(df, names_from = "second_grade", values_from = "n")

# A tibble: 1 × 9
  first_grade Great  Good Awful Elite `Below Average` Average   DNQ  Poor
  <chr>       <int> <int> <int> <int>           <int>   <int> <int> <int>
1 Elite          68    39    43    14              34       1    14    87

正如 Darren 指出的,解决方案是使用

names_sort = TRUE
参数来表示
pivot_wider()
(对您想要更宽的列进行排序也可以):

tidyr::pivot_wider(df, names_from = "second_grade", values_from = "n", names_sort = TRUE)

输出:

# A tibble: 1 × 9
  first_grade Elite Great  Good Average `Below Average`  Poor Awful   DNQ
  <chr>       <int> <int> <int>   <int>           <int> <int> <int> <int>
1 Elite          14    68    39       1              34    87    43    14
© www.soinside.com 2019 - 2024. All rights reserved.