如何使用 missRanger 来估算缺失的整数值?

问题描述 投票:0回答:1

我正在通过尝试使用 R 和 missRanger 来插补必须为整数的缺失变量来学习插补。但是,我收到以下错误:

## Error: Assigned data `if (...) NULL` must be compatible with existing data.
## i Error occurred for column `beds`.
## x Can't convert from <double> to <integer> due to loss of precision.
## * Locations: 1, 2.

似乎我无法估算整数值,但如果我先将它们设为小数,我就可以。

这是一个表示:

library(tidyverse)
library(missRanger)

# Here is a sample of the data
reprex_df

## # A tibble: 9 x 5
##    beds baths garages  price property_type
##   <int> <int>   <int>  <int> <chr>        
## 1    NA    NA      NA 770000 house        
## 2     2     1       0 300000 apartment    
## 3     2     2       2 735000 apartment    
## 4    NA    NA      NA 550000 apartment    
## 5     4     2       3 500000 house        
## 6     2     1       0 400000 apartment    
## 7     4     2       2 607000 house        
## 8     3     2       2 590000 house        
## 9     4     1       2 710000 house

# Try to impute missing bedrooms
imputed <- reprex_df %>% 
  missRanger()

## 
## Missing value imputation by random forests
## 
##   Variables to impute:       beds, baths, garages
##   Variables used to impute:  beds, baths, garages, price, property_type
## iter 1:  

## Error: Assigned data `if (...) NULL` must be compatible with existing data.
## i Error occurred for column `beds`.
## x Can't convert from <double> to <integer> due to loss of precision.
## * Locations: 1, 2.

# Convert integers to numerics and try again
imputed2 <- reprex_df %>% 
  mutate_if(is.integer,
            as.numeric) %>% 
  missRanger()

## 
## Missing value imputation by random forests
## 
##   Variables to impute:       beds, baths, garages
##   Variables used to impute:  beds, baths, garages, price, property_type
## iter 1:  ...
## iter 2:  ...
## iter 3:  ...
## iter 4:  ...
## iter 5:  ...

# That works, but decimal rooms don't make sense
imputed2

## # A tibble: 9 x 5
##    beds baths garages  price property_type
##   <dbl> <dbl>   <dbl>  <dbl> <chr>        
## 1  3.44  1.86    2.15 770000 house        
## 2  2     1       0    300000 apartment    
## 3  2     2       2    735000 apartment    
## 4  2.77  1.83    1.84 550000 apartment    
## 5  4     2       3    500000 house        
## 6  2     1       0    400000 apartment    
## 7  4     2       2    607000 house        
## 8  3     2       2    590000 house        
## 9  4     1       2    710000 house

如何使用 missRanger 估算缺失的整数?

r imputation
1个回答
2
投票

将数据集称为“reprex”并不会使示例可重现......

由于

missRanger
无法改变 tibble 内部对类型转换的反应方式,这里有两个建议:

  1. 在调用 missRanger 之前将 tibble 转换为 data.frame (这是我最喜欢的)

  2. 使用参数

    pmm.k
    在迭代之间使用预测均值匹配。这具有用现实值填补空白的良好副作用。整数将保持整数等

missRanger
的小插图解释了这些概念,请参阅https://cran.r-project.org/web/packages/missRanger/index.html

免责声明:我是

missRanger
的包维护者。

library(missRanger)
library(tidyverse)

# Example data
mtcars2 <- mtcars %>% 
  as_tibble() %>% 
  mutate(cyl = as.integer(cyl)) %>% 
  generateNA()

missRanger(mtcars2, pmm.k = 3, seed = 153)

# Gives
# # A tibble: 32 x 11
# mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
# <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#  21       6  160    105  3.9   2.62  16.5     0     1     4     4
#  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#  21.4     6  258    110  3.08  3.22  19.4     1     0     3     2
© www.soinside.com 2019 - 2024. All rights reserved.