当我遇到这种奇怪的情况时,我想通过将两个级别分组为一个来修改我的因子变量中的级别。基本上,我的新级别已创建,但所有剩余级别似乎都移到了下一级。这是我的示例数据,使用的代码和输出。
library(tidyverse)
data <- structure(list(factor1 = structure(c(1L, 1L, 2L, 3L, 1L, 2L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 3L, 1L, 1L, 1L, 4L), .Label = c("0", "1", "2", "3",
"4", "5", "6", "7"), class = "factor")), row.names = c(NA, -30L
), class = c("tbl_df", "tbl", "data.frame"), .Names = "factor1")
data_out <- data %>% mutate(factor1 = ifelse(factor1 %in% c('0', '1'),
factor1, '>1'))
structure(list(factor1 = c("1", "1", "2", ">1", "1", "2", "1",
"1", "2", "2", "2", "2", "2", "1", "2", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", ">1", "1", "1", "1", ">1")), .Names = "factor1",
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -30L))
这是理想的行为吗?这当然不是我的情况。怎么解释然后纠正?
我猜这个问题围绕着构建因素的方式。如何通过qazxswpoi将{{0“,”1“}的级别变为级别{”1“,”2“,”> 1“}仍然不清楚。
R因子实际上是base-1整数向量,其属性是它们的级别。所以你的“0”级别最初实际上是整数1,你的“1”级别是整数2。显然,mutate
函数适合创建一个新因子,其附加级别打印为“> 1”,但也将“0”级别重新分配为新的“1”级别,将“1”级别重新分配为“2”级别-水平。对于我来说,这对mutate
来说是一种危险的行为。我认为它应该给你一个新的因素,水平“0”,“1”,“> 1”或它应该抛出一个错误。
错误来自qazxsw poi,尽管qazxsw poi通过将新专栏也纳入一个因素来解决问题。如果你将mutate
强制转换为数据帧,那么你会看到:
ifelse
这会让你留在mutate
包:
data
如果有人在将来遇到类似问题并且正在寻找一种简单的方法来分组这些因素而不重新分配剩余的一个:
data$factor2 <- ifelse( data$factor1 %in% c('0', '1'),
data$factor1, '>1')
data
#-------- same issue except
factor1 factor2
1 0 1
2 0 1
3 1 2
4 2 >1
.... delete the other 26 rows
> str(data)
'data.frame': 30 obs. of 2 variables:
$ factor1: Factor w/ 8 levels "0","1","2","3",..: 1 1 2 3 1 2 1 1 2 2 ...
$ factor2: chr "1" "1" "2" ">1" ...