根据 tibble 中的条件使用 tidyverse 语法填充缺失值

问题描述 投票:0回答:2

根据某些情况,我很难在我的 tibble 中填充缺失值。

我正在尝试根据条件字段是 YES 还是 NO 在制造商列中填充 NA 值,我只对条件列包含 YES 的组填充 NA 值感兴趣。

我收到以下我无法弄清楚的错误:

没有适用于“填充”的方法应用于类的对象 “性格”

这是一个简单的代表,希望能为我要解决的问题提供一些背景信息。

df <- tibble(
  code = c("A", "A", "A", "A", "B", "B", "B", "B", "B"),
  cost = c(5000, 4000, 3000, 2000, 40000, 30000, 20000, 10000, 5000),
  manufacturer = c("ManA", NA, NA, NA, "ManB", "ManB", NA, NA, "ManB"),
  condition = c("NO", "NO", "NO", "NO", "YES", "YES", "YES", "YES", "YES")
) %>%
  group_by(code) %>%
  arrange(desc(cost), .by_group = TRUE) %>%
  mutate(manufacturer = if_else(condition == "YES", fill(manufacturer, .direction = "down"), manufacturer))

我在最后一行包含了 mutate 以突出显示我到目前为止尝试解决的问题但导致了上述错误。

如能提供任何帮助,我们将不胜感激。

r tidyverse fill
2个回答
1
投票

您可以创建

manufacturer
列的副本,
fill
所有内容,然后根据条件将原始值分配回去。

library(dplyr)
library(tidyr)

df %>%
  #Create a copy of the column
  mutate(new_manufacturer = manufacturer) %>%
  group_by(code) %>%
  arrange(desc(cost), .by_group = TRUE) %>%
  # Fill everything
  fill(new_manufacturer, .direction = "down") %>%
  # Keep the new value only if condition is "YES"
  mutate(new_manufacturer = if_else(condition == "YES", 
                 new_manufacturer, manufacturer)) %>%
  ungroup()

#  code   cost manufacturer condition new_manufacturer
#  <chr> <dbl> <chr>        <chr>     <chr>           
#1 A      5000 ManA         NO        ManA            
#2 A      4000 NA           NO        NA              
#3 A      3000 NA           NO        NA              
#4 A      2000 NA           NO        NA              
#5 B     40000 ManB         YES       ManB            
#6 B     30000 ManB         YES       ManB            
#7 B     20000 NA           YES       ManB            
#8 B     10000 NA           YES       ManB            
#9 B      5000 ManB         YES       ManB            

new_manufacturer
列是您要查找的输出。


0
投票

使用

replace

的方法
library(dplyr)

df %>% 
  mutate(manufacturer = replace(manufacturer, is.na(manufacturer) & 
           condition == "YES", na.omit(unique(manufacturer))), .by = code)
# A tibble: 9 × 4
  code   cost manufacturer condition
  <chr> <dbl> <chr>        <chr>    
1 A      5000 ManA         NO       
2 A      4000 NA           NO       
3 A      3000 NA           NO       
4 A      2000 NA           NO       
5 B     40000 ManB         YES      
6 B     30000 ManB         YES      
7 B     20000 ManB         YES      
8 B     10000 ManB         YES      
9 B      5000 ManB         YES
© www.soinside.com 2019 - 2024. All rights reserved.