在 R 中与 NA Group 进行合并

问题描述 投票:0回答:2

我一直在使用以下函数来创建偶数 bin 变量:

## Even Bins Funtion
evenbins <- function(x, bin.count = 5, order = T) {
  bin.size <- rep(length(x) %/% bin.count, bin.count)
  bin.size <- bin.size + ifelse(1:bin.count <= length(x) %% bin.count, 1,0)
  bin <- rep(1:bin.count, bin.size)
  if(order) {
    bin <- bin[rank(x, ties.method = "random")]
  }
  return(factor(bin, levels = 1:bin.count, ordered = order))
}

这对于对数值进行分箱非常有用,但是,它将 NA 分组到最后组(在本例中为第 5 个分箱)。所以如果旋转的话它会做这样的事情:

我想调整该函数以从初始分箱功能中删除 NA 并将它们保留为 NA 值,因此当我对 bin 列进行分组时,它会产生以下结果:

预先感谢您的阅读和任何帮助!!

可使用的示例代码:

##set up fake dataset

df1 <- data.frame(x = c(1:450))

df2 <- data.frame(x = 1:50)
df2$x <- NA

df3 <- rbind (df1, df2 )


## Even Bins Funtion
evenbins <- function(x, bin.count = 5, order = T) {
  bin.size <- rep(length(x) %/% bin.count, bin.count)
  bin.size <- bin.size + ifelse(1:bin.count <= length(x) %% bin.count, 1,0)
  bin <- rep(1:bin.count, bin.size)
  if(order) {
    bin <- bin[rank(x, ties.method = "random")]
  }
  return(factor(bin, levels = 1:bin.count, ordered = order))
}

df3$Bin <- evenbins(df3$x)
df3$isNA <- ifelse(is.na(df3$x) == TRUE, "# NA","complete")


t1 <- cbind(
  table(df3$Bin)
  ,table(df3$Bin, df3$isNA)
)
r function bin binning
2个回答
0
投票

这是一个简单的修改 - 计算

NA
的数量,将其删除,然后在最后再次将它们钉上:

evenbins <- function(x, bin.count = 5, order = T) {
  n_na = sum(is.na(x))
  x = na.omit(x)
  bin.size <- rep(length(x) %/% bin.count, bin.count)
  bin.size <- bin.size + ifelse(1:bin.count <= length(x) %% bin.count, 1,0)
  bin <- rep(1:bin.count, bin.size)
  if(order) {
    bin <- bin[rank(x, ties.method = "random")]
  }
  return(factor(c(bin, rep(NA, n_na)), levels = 1:bin.count, ordered = order))
}

df3 <- rbind (df1, df2 )
df3$Bin <- evenbins(df3$x)
df3$isNA <- ifelse(is.na(df3$x), "# NA","complete")
cbind(
  table(df3$Bin, useNA = "always")
  ,table(df3$Bin, df3$isNA, useNA = "always")
)
#         # NA complete <NA>
# 1    90    0       90    0
# 2    90    0       90    0
# 3    90    0       90    0
# 4    90    0       90    0
# 5    90    0       90    0
# <NA> 50   50        0    0

0
投票

这是一个相当简单的基本解决方案:

as.data.frame(  table( (df3+100) %/% 100, useNA="always")  , make.names = TRUE)
     x Freq
1    1   99
2    2  100
3    3  100
4    4  100
5    5   51
6 <NA>   50

关键技巧是通过将

useNA
参数添加到
table
来计算 NA。 +100 只是传递以 1 开头标记的值。

© www.soinside.com 2019 - 2024. All rights reserved.