我有这样的数据框
df = data.frame(day = c("1", NA, NA, NA, NA, "2", NA, NA, NA),
Unit = c("unit1", NA, NA, NA, "unit2", "unit1", NA, NA, "unit2"),
Problem = c("Oil", "Engine", "Electric", NA, NA, "Oil", "Power", NA, NA),
duration = c(2, 5, 1, NA, NA, 1.5, 3, NA, NA))
row 1:5是day1,6:9 day2,如果同一列有重复值,则值为NA。
我试过用
df %>%
pivot_wider(names_from = Problem, values_from = duration)
但是没用, 我期望的 df 是这样的
df1 = data.frame(day = c("1", "1", "2", "2"),
Unit = c("unit1", "unit2", "unit1", "unit2"),
Oil = c(2, 0, 1.5, 0),
Engine = c(5, 0, 0, 0),
Electric = c(1, 0, 0, 0),
Power = c(0, 0, 3, 0),
NoProblem = c(0, 0, 0, 0))
试试
library(dplyr)
library(tidyr)
df %>%
fill(day, Unit, Problem) %>%
distinct(day, Unit, Problem, .keep_all = TRUE) %>%
mutate(duration = replace_na(duration, 0)) %>%
pivot_wider(names_from = Problem, values_from = duration,
values_fill = 0) %>%
mutate(NoProblem = 0)
-输出
# A tibble: 4 × 7
day Unit Oil Engine Electric Power NoProblem
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 unit1 2 5 1 0 0
2 1 unit2 0 0 0 0 0
3 2 unit1 1.5 0 0 3 0
4 2 unit2 0 0 0 0 0
基本上,首先用
zoo::na.locf
填充日期和单位,然后是reshape
.
df[1:2] <- zoo::na.locf(df[1:2])
(res <- reshape(df, direction='wide', idvar=c('Unit', 'day'), timevar='Problem'))
# day Unit duration.Oil duration.Engine duration.Electric duration.NA duration.Power
# 1 1 unit1 2.0 5 1 NA NA
# 5 1 unit2 NA NA NA NA NA
# 6 2 unit1 1.5 NA NA NA 3
# 9 2 unit2 NA NA NA NA NA
在这里抛出警告,因为
unit2
没有问题。
NA
s 可以是 replace
d 与 0
,
replace(res, is.na(res), 0)
# day Unit duration.Oil duration.Engine duration.Electric duration.NA duration.Power
# 1 1 unit1 2.0 5 1 0 0
# 5 1 unit2 0.0 0 0 0 0
# 6 2 unit1 1.5 0 0 0 3
# 9 2 unit2 0.0 0 0 0 0
但是这样做并不正确,它们实际上并不重要,因为您经常可以在进行计算时使用
na.rm
,例如
by(res[3:7], res$Unit, colSums, na.rm=TRUE)
# res$Unit: unit1
# duration.Oil duration.Engine duration.Electric duration.NA duration.Power
# 3.5 5.0 1.0 0.0 3.0
# ------------------------------------------------------------------------------------------------------
# res$Unit: unit2
# duration.Oil duration.Engine duration.Electric duration.NA duration.Power
# 0 0 0 0 0