转换数据帧[重复]

问题描述 投票:0回答:3

这个问题在这里已有答案:

M     Product   Price
-------------------------
2014m1  Pepsi   55
2014m1  Coke    60
2014m2  Pepsi   55
2014m2  Coke    62
2014m3  Pepsi   55
2014m3  Coke    63
2014m4  Pepsi   55
2014m5  Pepsi   55
2014m6  Pepsi   55
2014m8  Pepsi   58
2014m9  Pepsi   58
2014m10 Pepsi   58
2014m11 Pepsi   58
2014m12 Pepsi   58

我有两个产品百事可乐和可口可乐的时间序列。我的意图是改变这个表如下表。

M     Product Price
--------------------------
2014m1  Coke    60
2014m2  Coke    62
2014m3  Coke    63
2014m4  Coke    NA
2014m5  Coke    NA
2014m6  Coke    NA
2014m7  Coke    NA
2014m8  Coke    NA
2014m9  Coke    NA
2014m10 Coke    NA
2014m11 Coke    NA
2014m12 Coke    NA
2014m1  Pepsi   55
2014m2  Pepsi   55
2014m3  Pepsi   55
2014m4  Pepsi   55
2014m5  Pepsi   55
2014m6  Pepsi   55
2014m7  Pepsi   58
2014m8  Pepsi   58
2014m9  Pepsi   58
2014m10 Pepsi   58
2014m11 Pepsi   58
2014m12 Pepsi   58

即在此表中,每个产品都有适当的月份和价格。所以有人可以帮我改造这个表吗?

r dataframe dplyr
3个回答
1
投票

这是通过tidyr::expand更灵活的解决方案。您不必指定要添加的行数(在您的情况下为12),因为我们使用sub处理它。

library(tidyverse)

my_df %>% 
 mutate(val = max(as.integer(sub('.*m', '', M)))) %>% 
 group_by(Product) %>% 
 expand(M = paste0('2014m', seq(val[1]))) %>% 
 left_join(., my_df)

这使,

# A tibble: 24 x 3
# Groups:   Product [?]
   Product M       Price
   <chr>   <chr>   <int>
 1 Coke    2014m1     60
 2 Coke    2014m10    NA
 3 Coke    2014m11    NA
 4 Coke    2014m12    NA
 5 Coke    2014m2     62
 6 Coke    2014m3     63
 7 Coke    2014m4     NA
 8 Coke    2014m5     NA
 9 Coke    2014m6     NA
10 Coke    2014m7     NA
# ... with 14 more rows

2
投票

你可以使用completetidyr。首先将M转换为您想要在数据中拥有所有级别的因子,然后使用complete来填充产品。

my_df %>% 
  mutate(M = factor(M, levels = paste0(2014, "m", 1:12))) %>%
  complete(M, Product)

# A tibble: 24 x 3
#    M      Product Price
#    <fct>  <chr>   <int>
#  1 2014m1 Coke       60
#  2 2014m1 Pepsi      55
#  3 2014m2 Coke       62
#  4 2014m2 Pepsi      55
#  5 2014m3 Coke       63
#  6 2014m3 Pepsi      55
#  7 2014m4 Coke       NA
#  8 2014m4 Pepsi      55
#  9 2014m5 Coke       NA
# 10 2014m5 Pepsi      55
# ... with 14 more rows

数据

my_df <- structure(list(M = c("2014m1", "2014m1", "2014m2", "2014m2", "2014m3", "2014m3", 
                     "2014m4", "2014m5", "2014m6", "2014m8", "2014m9", "2014m10", 
                     "2014m11", "2014m12"), 
               Product = c("Pepsi", "Coke", "Pepsi", "Coke", "Pepsi", "Coke", 
                           "Pepsi", "Pepsi", "Pepsi", "Pepsi", "Pepsi", "Pepsi",
                           "Pepsi", "Pepsi"), 
               Price = c(55L, 60L, 55L, 62L, 55L, 63L, 55L, 55L, 55L, 58L, 58L, 
                         58L, 58L, 58L)), 
          class = "data.frame", row.names = c(NA, -14L))

1
投票

我们可以做的一种方法是创建一个包含所有可能组合的新数据框,然后使用原始数据框merge

new_df <- data.frame(M = paste0(2014, "m", seq(12)), 
         Product = rep(unique(df$Product), each = 12))

merge(new_df, df, all.x = TRUE)


#         M  Product Price
#1   2014m1    Coke    60
#2   2014m1   Pepsi    55
#3   2014m10   Coke    NA
#4   2014m10  Pepsi    58
#5   2014m11   Coke    NA
#6   2014m11  Pepsi    58
#7   2014m12   Coke    NA
#8   2014m12  Pepsi    58
#9   2014m2    Coke    62
#10  2014m2   Pepsi    55
......

这里df是您的原始数据帧。

© www.soinside.com 2019 - 2024. All rights reserved.