如何将观察结果归咎于数据帧列表中的一个变量？（二元时间序列）

Question

我有几个关于特定国家对及其 1870-2020 年贸易量的单独 csv 文件（使用 COW 贸易数据集，此处为 smoothtotrade 变量）。不幸的是，该数据集仅在 2014 年之前可用，因此所有其他值均设置为 NA。

在尝试了多种方法来估算/预测缺失的数据后，我决定最好只保留最后一个可用值（即 2014 年的 smoothtotrade）。但是，我无法让它工作。我在这里一直使用 imputeTS 包，使用 na_locf 函数。有人可以帮我吗？

数据帧列表称为data_frames。我当前的代码：

library(imputeTS)

*Imputation function using carry forward of the average of the last three non-missing values*

impute_smoothtotrade <- function(ts_data) {

  ts_data_imputed <- na.locf(ts_data, option = "locf")
  
  return(ts_data_imputed)
}

*Loop through each data frame (time series) in the list*

for (i in seq_along(data_frames)) {
  
  data_frames[[i]]$smoothtotrade <- impute_smoothtotrade(data_frames[[i]]$smoothtotrade)
}

这是随机国家对的结果，清楚地表明 2014 年的值显然没有按预期进行。

51    AUT    CMR 2010     11.484859  
52    AUT    CMR 2011     10.393110  
53    AUT    CMR 2012      6.902980  
54    AUT    CMR 2013      4.058900  
55    AUT    CMR 2014      9.018300  
89    AUT    CMR 2015      2.582298  
90    AUT    CMR 2016      2.582298  
91    AUT    CMR 2017      2.582298  
92    AUT    CMR 2018      2.582298  
93    AUT    CMR 2019      2.582298  
94    AUT    CMR 2020      2.582298

Answer 1

两个（众多）选项：

样本数据

# Sample dataframes and data_frame list
df1 <- data.frame(country = c(rep("AAA", 11)), year = 2010:2020,
                  smoothtotrade = c(11.484859, 10.393110, 6.902980, 4.058900, 9.018300, rep(NA, 6)))

df2 <- data.frame(country = c(rep("BBB", 11)), year = 2010:2020,
                  smoothtotrade = c(12.484859, 1.393110, 3.902980, 8.058900, 5.018300, rep(NA, 6)))

df3 <- data.frame(country = c(rep("CCC", 11)), year = 2010:2020,
                  smoothtotrade = c(8.484859, 9.393110, 10.902980, 9.058900, 8.018300, rep(NA, 6)))

data_frames <- list(df1, df2, df3)

选项 1：使用

dplyr

和
tidyr
包

library(dplyr)
library(tidyr)

# Single df with all dataframes
df4 <- bind_rows(data_frames, .id = "column_label")

result <- df4 %>%
  group_by(country) %>%
  fill(smoothtotrade, .direction = c("down")) %>%
  ungroup()

result
# A tibble: 33 × 4
   column_label country  year smoothtotrade
   <chr>        <chr>   <int>         <dbl>
 1 1            AAA      2010          11.5 
 2 1            AAA      2011          10.4 
 3 1            AAA      2012          6.90
 4 1            AAA      2013          4.06
 5 1            AAA      2014          9.02
 6 1            AAA      2015          9.02
 7 1            AAA      2016          9.02
 8 1            AAA      2017          9.02
 9 1            AAA      2018          9.02
10 1            AAA      2019          9.02
# ℹ 23 more rows
# ℹ Use `print(n = ...)` to see more rows

选项 2：使用原来的方法

for (i in seq_along(data_frames)) {
  
  data_frames[[i]]$smoothtotrade <- 
    ifelse(is.na(data_frames[[i]]$smoothtotrade),
           data_frames[[i]]$smoothtotrade[max(which(!is.na(data_frames[[i]]$smoothtotrade)))],
           data_frames[[i]]$smoothtotrade)
  
}

data_frames
[[1]]
   country year smoothtotrade
1      AAA 2010      11.48486
2      AAA 2011      10.39311
3      AAA 2012       6.90298
4      AAA 2013       4.05890
5      AAA 2014       9.01830
6      AAA 2015       9.01830
7      AAA 2016       9.01830
8      AAA 2017       9.01830
9      AAA 2018       9.01830
10     AAA 2019       9.01830
11     AAA 2020       9.01830

[[2]]
   country year smoothtotrade
1      BBB 2010      12.48486
2      BBB 2011       1.39311
3      BBB 2012       3.90298
4      BBB 2013       8.05890
5      BBB 2014       5.01830
6      BBB 2015       5.01830
7      BBB 2016       5.01830
8      BBB 2017       5.01830
9      BBB 2018       5.01830
10     BBB 2019       5.01830
11     BBB 2020       5.01830

[[3]]
   country year smoothtotrade
1      CCC 2010      8.484859
2      CCC 2011      9.393110
3      CCC 2012     10.902980
4      CCC 2013      9.058900
5      CCC 2014      8.018300
6      CCC 2015      8.018300
7      CCC 2016      8.018300
8      CCC 2017      8.018300
9      CCC 2018      8.018300
10     CCC 2019      8.018300
11     CCC 2020      8.018300

如何将观察结果归咎于数据帧列表中的一个变量？（二元时间序列）

问题描述投票：0回答：1

1个回答

最新问题

如何将观察结果归咎于数据帧列表中的一个变量？ （二元时间序列）

问题描述 投票：0回答：1

1个回答

最新问题

如何将观察结果归咎于数据帧列表中的一个变量？（二元时间序列）

问题描述投票：0回答：1