这是一个包含从0到1的值的向量:
a <- runif(100, 0, 1)
我想做以下转换
>= 0.975 becomes AA+
<= 0.025 becomes AA-
< 0.975 && > 0.025 becomes AA
a[a >= 0.975] = 'AA+'
sum(a == 'AA+')
3
a[a <= 0.025] = 'AA-'
sum(a == 'AA-')
2
a[a > 0.025 && a < 0.975] = 'AA'
sum(a == 'AA')
100
换一种说法:
a
[1] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"
[16] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"
[31] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"
[46] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"
[61] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"
[76] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"
[91] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"
我很困惑为什么会这样。为什么AA
会覆盖前两次转换?
请注意,只要你这样做:
a[a >= 0.975] = 'AA+'
整个矢量a
被转换为不太理想的字符。这样做会更好:
aa <- character(length(a)) # pre-allocate aa
aa[a >= 0.975] <- "AA+"
aa[a > 0.025 & a < 0.975] <- "AA" # note &, not &&
aa[a <= 0.025] <- "AA-"
以下是一些替代方案:
1)切割cut
将工作,但值0.975将被指定为“AA”:
cut(a, c(0, 0.025, 0.975, 1), lab = c("AA-", "AA", "AA+"))
2)下标
c("AA-", "AA", "AA+")[ 1 + (a > 0.025) + (a >= 0.975) ]
3)ifelse
ifelse(a <= 0.025, "AA-", ifelse(a < 0.975, "AA", "AA+"))
4)case_when
library(dplyr)
case_when( a <= 0.025 ~ "AA-",
a < 0.975 ~ "AA",
TRUE ~ "AA+")
1)修改原始解决方案我们需要使用单个&
而不是&&
a[a > 0.025 & a < 0.975] = 'AA'
table(a)
# a
# AA AA- AA+
# 92 5 3
2)解释根据?"&"
&和&&表示逻辑AND和|和||表示逻辑OR。较短的形式以与算术运算符大致相同的方式执行元素比较。较长的形式从左到右评估仅检查每个向量的第一个元素。评估仅在确定结果之前进行
差异很容易理解,即逻辑条件的输出是单个元素
a > 0.025 && a < 0.975
#[1] TRUE
回收和所有元素都用'AA'
取代
而如果我们这样做
a > 0.025 & a < 0.975
# [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
# [13] TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE TRUE
# [25] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE
# [37] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
# [49] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
# [61] TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
# [73] TRUE TRUE TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
# [85] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
# [97] TRUE TRUE TRUE TRUE
3)替代解决方案如果我们需要使用更好的方法,那就有findInterval
c("AA-", "AA", "AA+")[findInterval(a, c(0, 0.025, 0.975))]
4)替换另一种选择是与replace
library(dplyr) #for chaining
replace(a, a >= 0.975, 'AA+') %>%
replace(., .<= 0.025, 'AA-') %>%
replace(., . >0.025 & . < 0.975, 'AA')
set.seed(42)
a <- runif(100, 0, 1)
a[a >= 0.975] = 'AA+'
a[a <= 0.025] = 'AA-'