我的数据有一个意外的因素,它与&结合了两个等级:“中级7和8”
重新确定此值的最佳方法是什么?将来,也有可能以这种方式组合因子,例如“ Beginner 3&4”等。
#Relevel factors
Sample <-as.factor(c("Beginner 1","intermediate 8", "intermediate 7 & 8",
"Expert 2","Expert 10","Beginner 3","Beginner 5","Beginner 10",
"intermediate 1","Expert 1",NA))
newLevel<-factor(c("NA", paste0("Beginner ", 1:10), paste0("intermediate ",1:10), paste0("Expert ",1:10)))
newSample<-factor(Sample, levels=newLevel)
> newSample
[1] Beginner 1 intermediate 8 <NA> Expert 2 Expert 10 Beginner 3 Beginner 5 Beginner 10 intermediate 1
[10] Expert 1 <NA>
31 Levels: NA Beginner 1 Beginner 2 Beginner 3 Beginner 4 Beginner 5 Beginner 6 Beginner 7 Beginner 8 Beginner 9 Beginner 10 ... Expert 10
#Change factor to Numeric
SampleNum<-as.numeric(factor(Sample, levels=newLevel))
> SampleNum
[1] 2 19 NA 23 31 4 6 11 12 22 NA
因此“中间体7和8”被视为NA。它必须在“中级7”和“中级8”之间。
有什么好主意可以将其分解,并可以转换为数字吗?
非常感谢!
您可以剥离数字,计算平均值并重新分配。不要被levels=
和labels=
弄糊涂!
prefix <- gsub("\\s(.*)", "", levels(Sample))
suffix <- sapply(strsplit(trimws(gsub("\\D+", " ", levels(Sample))), " "), function(x)
mean(as.numeric(x)))
new.levels <- paste(prefix, suffix)
new.Sample <- factor(Sample, labels=new.levels)
cbind(Sample=levels(Sample), new.Sample=levels(new.Sample))
# Sample new.Sample
# [1,] "Beginner 1" "Beginner 1"
# [2,] "Beginner 10" "Beginner 10"
# [3,] "Beginner 3" "Beginner 3"
# [4,] "Beginner 5" "Beginner 5"
# [5,] "Expert 1" "Expert 1"
# [6,] "Expert 10" "Expert 10"
# [7,] "Expert 2" "Expert 2"
# [8,] "intermediate 1" "intermediate 1"
# [9,] "intermediate 7 & 8" "intermediate 7.5"
# [10,] "intermediate 8" "intermediate 8"
as.numeric(new.Sample)
# [1] 1 10 9 7 6 3 4 2 8 5 NA
数据
Sample <- structure(c(1L, 10L, 9L, 7L, 6L, 3L, 4L, 2L, 8L, 5L, NA), .Label = c("Beginner 1",
"Beginner 10", "Beginner 3", "Beginner 5", "Expert 1", "Expert 10",
"Expert 2", "intermediate 1", "intermediate 7 & 8", "intermediate 8"
), class = "factor")