R-如何用&

问题描述 投票:0回答:1
重新组合结合两个水平的因子

我的数据有一个意外的因素,它与&结合了两个等级:“中级7和8”

重新确定此值的最佳方法是什么?将来,也有可能以这种方式组合因子,例如“ Beginner 3&4”等。

    #Relevel factors
    Sample <-as.factor(c("Beginner 1","intermediate 8", "intermediate 7 & 8", 
                 "Expert 2","Expert 10","Beginner 3","Beginner 5","Beginner 10",
                 "intermediate 1","Expert 1",NA))
    newLevel<-factor(c("NA", paste0("Beginner ", 1:10), paste0("intermediate ",1:10), paste0("Expert ",1:10)))
    newSample<-factor(Sample, levels=newLevel)

    > newSample
     [1] Beginner 1     intermediate 8 <NA>           Expert 2       Expert 10      Beginner 3     Beginner 5     Beginner 10    intermediate 1
    [10] Expert 1       <NA>          
    31 Levels: NA Beginner 1 Beginner 2 Beginner 3 Beginner 4 Beginner 5 Beginner 6 Beginner 7 Beginner 8 Beginner 9 Beginner 10 ... Expert 10        

    #Change factor to Numeric
    SampleNum<-as.numeric(factor(Sample, levels=newLevel))
    > SampleNum
     [1]  2 19 NA 23 31  4  6 11 12 22 NA

因此“中间体7和8”被视为NA。它必须在“中级7”和“中级8”之间。

有什么好主意可以将其分解,并可以转换为数字吗?

非常感谢!

r ampersand
1个回答
0
投票

您可以剥离数字,计算平均值并重新分配。不要被levels=labels=弄糊涂!

prefix <- gsub("\\s(.*)", "", levels(Sample))
suffix <- sapply(strsplit(trimws(gsub("\\D+", " ", levels(Sample))), " "), function(x) 
  mean(as.numeric(x)))

new.levels <- paste(prefix, suffix)
new.Sample <- factor(Sample, labels=new.levels)

比较

cbind(Sample=levels(Sample), new.Sample=levels(new.Sample))
#      Sample               new.Sample        
# [1,] "Beginner 1"         "Beginner 1"      
# [2,] "Beginner 10"        "Beginner 10"     
# [3,] "Beginner 3"         "Beginner 3"      
# [4,] "Beginner 5"         "Beginner 5"      
# [5,] "Expert 1"           "Expert 1"        
# [6,] "Expert 10"          "Expert 10"       
# [7,] "Expert 2"           "Expert 2"        
# [8,] "intermediate 1"     "intermediate 1"  
# [9,] "intermediate 7 & 8" "intermediate 7.5"
# [10,] "intermediate 8"     "intermediate 8" 

转换为数字

as.numeric(new.Sample)
# [1]  1 10  9  7  6  3  4  2  8  5 NA

数据

Sample <- structure(c(1L, 10L, 9L, 7L, 6L, 3L, 4L, 2L, 8L, 5L, NA), .Label = c("Beginner 1", 
"Beginner 10", "Beginner 3", "Beginner 5", "Expert 1", "Expert 10", 
"Expert 2", "intermediate 1", "intermediate 7 & 8", "intermediate 8"
), class = "factor")
© www.soinside.com 2019 - 2024. All rights reserved.