这个问题在这里已有答案:
M Product Price
-------------------------
2014m1 Pepsi 55
2014m1 Coke 60
2014m2 Pepsi 55
2014m2 Coke 62
2014m3 Pepsi 55
2014m3 Coke 63
2014m4 Pepsi 55
2014m5 Pepsi 55
2014m6 Pepsi 55
2014m8 Pepsi 58
2014m9 Pepsi 58
2014m10 Pepsi 58
2014m11 Pepsi 58
2014m12 Pepsi 58
我有两个产品百事可乐和可口可乐的时间序列。我的意图是改变这个表如下表。
M Product Price
--------------------------
2014m1 Coke 60
2014m2 Coke 62
2014m3 Coke 63
2014m4 Coke NA
2014m5 Coke NA
2014m6 Coke NA
2014m7 Coke NA
2014m8 Coke NA
2014m9 Coke NA
2014m10 Coke NA
2014m11 Coke NA
2014m12 Coke NA
2014m1 Pepsi 55
2014m2 Pepsi 55
2014m3 Pepsi 55
2014m4 Pepsi 55
2014m5 Pepsi 55
2014m6 Pepsi 55
2014m7 Pepsi 58
2014m8 Pepsi 58
2014m9 Pepsi 58
2014m10 Pepsi 58
2014m11 Pepsi 58
2014m12 Pepsi 58
即在此表中,每个产品都有适当的月份和价格。所以有人可以帮我改造这个表吗?
这是通过tidyr::expand
更灵活的解决方案。您不必指定要添加的行数(在您的情况下为12),因为我们使用sub
处理它。
library(tidyverse)
my_df %>%
mutate(val = max(as.integer(sub('.*m', '', M)))) %>%
group_by(Product) %>%
expand(M = paste0('2014m', seq(val[1]))) %>%
left_join(., my_df)
这使,
# A tibble: 24 x 3 # Groups: Product [?] Product M Price <chr> <chr> <int> 1 Coke 2014m1 60 2 Coke 2014m10 NA 3 Coke 2014m11 NA 4 Coke 2014m12 NA 5 Coke 2014m2 62 6 Coke 2014m3 63 7 Coke 2014m4 NA 8 Coke 2014m5 NA 9 Coke 2014m6 NA 10 Coke 2014m7 NA # ... with 14 more rows
你可以使用complete
的tidyr
。首先将M
转换为您想要在数据中拥有所有级别的因子,然后使用complete来填充产品。
my_df %>%
mutate(M = factor(M, levels = paste0(2014, "m", 1:12))) %>%
complete(M, Product)
# A tibble: 24 x 3
# M Product Price
# <fct> <chr> <int>
# 1 2014m1 Coke 60
# 2 2014m1 Pepsi 55
# 3 2014m2 Coke 62
# 4 2014m2 Pepsi 55
# 5 2014m3 Coke 63
# 6 2014m3 Pepsi 55
# 7 2014m4 Coke NA
# 8 2014m4 Pepsi 55
# 9 2014m5 Coke NA
# 10 2014m5 Pepsi 55
# ... with 14 more rows
数据
my_df <- structure(list(M = c("2014m1", "2014m1", "2014m2", "2014m2", "2014m3", "2014m3",
"2014m4", "2014m5", "2014m6", "2014m8", "2014m9", "2014m10",
"2014m11", "2014m12"),
Product = c("Pepsi", "Coke", "Pepsi", "Coke", "Pepsi", "Coke",
"Pepsi", "Pepsi", "Pepsi", "Pepsi", "Pepsi", "Pepsi",
"Pepsi", "Pepsi"),
Price = c(55L, 60L, 55L, 62L, 55L, 63L, 55L, 55L, 55L, 58L, 58L,
58L, 58L, 58L)),
class = "data.frame", row.names = c(NA, -14L))
我们可以做的一种方法是创建一个包含所有可能组合的新数据框,然后使用原始数据框merge
new_df <- data.frame(M = paste0(2014, "m", seq(12)),
Product = rep(unique(df$Product), each = 12))
merge(new_df, df, all.x = TRUE)
# M Product Price
#1 2014m1 Coke 60
#2 2014m1 Pepsi 55
#3 2014m10 Coke NA
#4 2014m10 Pepsi 58
#5 2014m11 Coke NA
#6 2014m11 Pepsi 58
#7 2014m12 Coke NA
#8 2014m12 Pepsi 58
#9 2014m2 Coke 62
#10 2014m2 Pepsi 55
......
这里df
是您的原始数据帧。