如何重新组合由“&”组成的两个层次的因子

问题描述 投票:0回答:1

我的数据有一个意外的因素,它与&结合了两个等级:“中级7和8”

重新确定此值的最佳方法是什么?将来,也有可能以这种方式组合因子,例如“ Beginner 3&4”等。

#Relevel factors
Sample <- as.factor(c("Beginner 1","intermediate 8", "intermediate 7 & 8", 
                     "Expert 2","Expert 10","Beginner 3 & 4","Beginner 5",
                     "Beginner 10", "intermediate 1", "Expert 1", NA))
newLevel <- factor(c("NA", paste0("Beginner ", 1:10), paste0("intermediate ", 1:10), 
                   paste0("Expert ", 1:10)))
newSample <- factor(Sample, levels=newLevel)

newSample
# [1] Beginner 1     intermediate 8 <NA>           Expert 2       Expert 10     
# [6] Beginner 3     Beginner 5     Beginner 10    intermediate 1 Expert 1      
# [11] <NA>          
#   31 Levels: NA Beginner 1 Beginner 2 Beginner 3 Beginner 4 Beginner 5 ... Expert 10

#Change factor to Numeric
SampleNum <- as.numeric(factor(Sample, levels=newLevel))
SampleNum
# [1]  2 19 NA 23 31  4  6 11 12 22 NA

因此“中间体7和8”被视为NA。它必须在“中级7”和“中级8”之间。

有什么好主意可以将其分解,并可以转换为数字吗?

r ampersand
1个回答
0
投票

如果有两次出现以获得准数值mean,则可以剥离数字并计算suffix

suffix <- sapply(strsplit(trimws(gsub("\\D+", " ", levels(Sample))), " "), function(x) 
  mean(as.numeric(x)))

然后,为了获得prefix,使用cat.df作为分配矩阵,可以按正确的顺序将类别转换为更高的数字。

cat.df <- data.frame(c("Beginner", "intermediate", "Expert"),
                      (1:3)*100)
prefix <- sapply(gsub("(\\D+)\\s.*", "\\1", levels(Sample)), function(x, y) 
  cat.df[match(x, y), 2], cat.df[, 1])

这就是重新调整Sample向量的全部。

new.Sample <- factor(Sample, levels=levels(Sample)[order(prefix + suffix)])
#  [1] Beginner 1         intermediate 8     intermediate 7 & 8 Expert 2          
#  [5] Expert 10          Beginner 3 & 4     Beginner 5         Beginner 10       
#  [9] intermediate 1     Expert 1           <NA>              
# 10 Levels: Beginner 1 Beginner 3 & 4 Beginner 5 Beginner 10 ... Expert 10

检查

data.frame(sort(new.Sample), as.numeric(sort(new.Sample)))
#      sort.new.Sample. as.numeric.sort.new.Sample..
# 1          Beginner 1                            1
# 2      Beginner 3 & 4                            2
# 3          Beginner 5                            3
# 4         Beginner 10                            4
# 5      intermediate 1                            5
# 6  intermediate 7 & 8                            6
# 7      intermediate 8                            7
# 8            Expert 1                            8
# 9            Expert 2                            9

#10专家10 10

数据

Sample <- structure(c(1L, 10L, 9L, 7L, 6L, 3L, 4L, 2L, 8L, 5L, NA), .Label = c("Beginner 1", 
"Beginner 10", "Beginner 3 & 4", "Beginner 5", "Expert 1", "Expert 10", 
"Expert 2", "intermediate 1", "intermediate 7 & 8", "intermediate 8"
), class = "factor")
© www.soinside.com 2019 - 2024. All rights reserved.