我的数据有一个意外的因素,它与&结合了两个等级:“中级7和8”
重新确定此值的最佳方法是什么?将来,也有可能以这种方式组合因子,例如“ Beginner 3&4”等。
#Relevel factors
Sample <- as.factor(c("Beginner 1","intermediate 8", "intermediate 7 & 8",
"Expert 2","Expert 10","Beginner 3 & 4","Beginner 5",
"Beginner 10", "intermediate 1", "Expert 1", NA))
newLevel <- factor(c("NA", paste0("Beginner ", 1:10), paste0("intermediate ", 1:10),
paste0("Expert ", 1:10)))
newSample <- factor(Sample, levels=newLevel)
newSample
# [1] Beginner 1 intermediate 8 <NA> Expert 2 Expert 10
# [6] Beginner 3 Beginner 5 Beginner 10 intermediate 1 Expert 1
# [11] <NA>
# 31 Levels: NA Beginner 1 Beginner 2 Beginner 3 Beginner 4 Beginner 5 ... Expert 10
#Change factor to Numeric
SampleNum <- as.numeric(factor(Sample, levels=newLevel))
SampleNum
# [1] 2 19 NA 23 31 4 6 11 12 22 NA
因此“中间体7和8”被视为NA。它必须在“中级7”和“中级8”之间。
有什么好主意可以将其分解,并可以转换为数字吗?
如果有两次出现以获得准数值mean
,则可以剥离数字并计算suffix
。
suffix <- sapply(strsplit(trimws(gsub("\\D+", " ", levels(Sample))), " "), function(x)
mean(as.numeric(x)))
然后,为了获得prefix
,使用cat.df
作为分配矩阵,可以按正确的顺序将类别转换为更高的数字。
cat.df <- data.frame(c("Beginner", "intermediate", "Expert"),
(1:3)*100)
prefix <- sapply(gsub("(\\D+)\\s.*", "\\1", levels(Sample)), function(x, y)
cat.df[match(x, y), 2], cat.df[, 1])
这就是重新调整Sample
向量的全部。
new.Sample <- factor(Sample, levels=levels(Sample)[order(prefix + suffix)])
# [1] Beginner 1 intermediate 8 intermediate 7 & 8 Expert 2
# [5] Expert 10 Beginner 3 & 4 Beginner 5 Beginner 10
# [9] intermediate 1 Expert 1 <NA>
# 10 Levels: Beginner 1 Beginner 3 & 4 Beginner 5 Beginner 10 ... Expert 10
data.frame(sort(new.Sample), as.numeric(sort(new.Sample)))
# sort.new.Sample. as.numeric.sort.new.Sample..
# 1 Beginner 1 1
# 2 Beginner 3 & 4 2
# 3 Beginner 5 3
# 4 Beginner 10 4
# 5 intermediate 1 5
# 6 intermediate 7 & 8 6
# 7 intermediate 8 7
# 8 Expert 1 8
# 9 Expert 2 9
数据
Sample <- structure(c(1L, 10L, 9L, 7L, 6L, 3L, 4L, 2L, 8L, 5L, NA), .Label = c("Beginner 1",
"Beginner 10", "Beginner 3 & 4", "Beginner 5", "Expert 1", "Expert 10",
"Expert 2", "intermediate 1", "intermediate 7 & 8", "intermediate 8"
), class = "factor")