使用条件将数字替换为文本

问题描述 投票:0回答:2

这是一个包含从0到1的值的向量:

a <- runif(100, 0, 1)

我想做以下转换

>= 0.975 becomes AA+  
<= 0.025 becomes AA-  
< 0.975 && > 0.025 becomes AA
a[a >= 0.975] = 'AA+'  
sum(a == 'AA+')  
3

a[a <= 0.025] = 'AA-'  
sum(a == 'AA-')   
2

a[a > 0.025 && a < 0.975] = 'AA'  
sum(a == 'AA')  
100

换一种说法:

a

[1] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"
 [16] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"
 [31] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"
 [46] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"
 [61] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"
 [76] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"
 [91] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"

我很困惑为什么会这样。为什么AA会覆盖前两次转换?

r
2个回答
1
投票

请注意,只要你这样做:

a[a >= 0.975] = 'AA+'  

整个矢量a被转换为不太理想的字符。这样做会更好:

aa <- character(length(a))  # pre-allocate aa
aa[a >= 0.975] <- "AA+"
aa[a > 0.025 & a < 0.975] <- "AA"  # note &, not &&
aa[a <= 0.025] <- "AA-"

以下是一些替代方案:

1)切割cut将工作,但值0.975将被指定为“AA”:

cut(a, c(0, 0.025, 0.975, 1), lab = c("AA-", "AA", "AA+"))

2)下标

c("AA-", "AA", "AA+")[ 1 + (a > 0.025) + (a >= 0.975) ]

3)ifelse

ifelse(a <= 0.025, "AA-", ifelse(a < 0.975, "AA", "AA+"))

4)case_when

library(dplyr)

case_when( a <= 0.025 ~ "AA-",
           a < 0.975 ~ "AA",
           TRUE ~ "AA+")

0
投票

1)修改原始解决方案我们需要使用单个&而不是&&

a[a > 0.025 & a < 0.975] = 'AA'   
table(a)
# a
#  AA AA- AA+ 
#  92   5   3 

2)解释根据?"&"

&和&&表示逻辑AND和|和||表示逻辑OR。较短的形式以与算术运算符大致相同的方式执行元素比较。较长的形式从左到右评估仅检查每个向量的第一个元素。评估仅在确定结果之前进行

差异很容易理解,即逻辑条件的输出是单个元素

a > 0.025 && a < 0.975
#[1] TRUE

回收和所有元素都用'AA'取代

而如果我们这样做

a > 0.025 & a < 0.975
#  [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
# [13]  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE
# [25]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE
# [37] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
# [49]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
# [61]  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
# [73]  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE
# [85]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE
# [97]  TRUE  TRUE  TRUE  TRUE

3)替代解决方案如果我们需要使用更好的方法,那就有findInterval

c("AA-", "AA", "AA+")[findInterval(a, c(0, 0.025, 0.975))]

4)替换另一种选择是与replace

library(dplyr) #for chaining
replace(a, a >= 0.975, 'AA+') %>%
       replace(., .<= 0.025, 'AA-') %>% 
       replace(., . >0.025 & . < 0.975, 'AA')

data

set.seed(42)
a <- runif(100, 0, 1)
a[a >= 0.975] = 'AA+'
a[a <= 0.025] = 'AA-'    
© www.soinside.com 2019 - 2024. All rights reserved.