如果数据框有任何 NA 值,如何使用更宽的数据透视表

问题描述 投票:0回答:2

我有这样的数据框

df = data.frame(day = c("1", NA, NA, NA, NA, "2", NA, NA, NA),
                Unit = c("unit1", NA, NA, NA, "unit2", "unit1", NA, NA, "unit2"),
                Problem = c("Oil", "Engine", "Electric", NA, NA, "Oil", "Power", NA, NA),
                duration = c(2, 5, 1, NA, NA, 1.5, 3, NA, NA))

row 1:5是day1,6:9 day2,如果同一列有重复值,则值为NA。

我试过用

df %>% 
  pivot_wider(names_from = Problem, values_from = duration)

但是没用, 我期望的 df 是这样的

df1 = data.frame(day = c("1", "1", "2", "2"),
                 Unit = c("unit1", "unit2", "unit1", "unit2"),
                 Oil = c(2, 0, 1.5, 0),
                 Engine = c(5, 0, 0, 0),
                 Electric = c(1, 0, 0, 0),
                 Power = c(0, 0, 3, 0),
                 NoProblem = c(0, 0, 0, 0))
r tidyr data-manipulation
2个回答
0
投票

试试

library(dplyr)
library(tidyr)
df %>% 
  fill(day, Unit, Problem) %>%
  distinct(day, Unit, Problem, .keep_all = TRUE) %>% 
  mutate(duration = replace_na(duration, 0)) %>% 
  pivot_wider(names_from = Problem, values_from = duration, 
   values_fill = 0) %>% 
  mutate(NoProblem = 0)

-输出

# A tibble: 4 × 7
  day   Unit    Oil Engine Electric Power NoProblem
  <chr> <chr> <dbl>  <dbl>    <dbl> <dbl>     <dbl>
1 1     unit1   2        5        1     0         0
2 1     unit2   0        0        0     0         0
3 2     unit1   1.5      0        0     3         0
4 2     unit2   0        0        0     0         0

0
投票

基本上,首先用

zoo::na.locf
填充日期和单位,然后是
reshape
.

df[1:2] <- zoo::na.locf(df[1:2])
(res <- reshape(df, direction='wide', idvar=c('Unit', 'day'), timevar='Problem'))
#   day  Unit duration.Oil duration.Engine duration.Electric duration.NA duration.Power
# 1   1 unit1          2.0               5                 1          NA             NA
# 5   1 unit2           NA              NA                NA          NA             NA
# 6   2 unit1          1.5              NA                NA          NA              3
# 9   2 unit2           NA              NA                NA          NA             NA

在这里抛出警告,因为

unit2
没有问题。

NA
s 可以是
replace
d 与
0
,

replace(res, is.na(res), 0)
#   day  Unit duration.Oil duration.Engine duration.Electric duration.NA duration.Power
# 1   1 unit1          2.0               5                 1           0              0
# 5   1 unit2          0.0               0                 0           0              0
# 6   2 unit1          1.5               0                 0           0              3
# 9   2 unit2          0.0               0                 0           0              0

但是这样做并不正确,它们实际上并不重要,因为您经常可以在进行计算时使用

na.rm
,例如

by(res[3:7], res$Unit, colSums, na.rm=TRUE)
# res$Unit: unit1
#     duration.Oil   duration.Engine duration.Electric       duration.NA    duration.Power 
#              3.5               5.0               1.0               0.0               3.0 
# ------------------------------------------------------------------------------------------------------ 
# res$Unit: unit2
#     duration.Oil   duration.Engine duration.Electric       duration.NA    duration.Power 
#                0                 0                 0                 0                 0 
© www.soinside.com 2019 - 2024. All rights reserved.