R 中的池化横截面滞后

问题描述 投票:0回答:1

我汇集了不同时间点不同状态的横截面数据。我试图了解农作物生长期间的天气事件如何影响农作物目前的定价。为此,我想将

precipitation
变量滞后 1 个月、2 个月和 3 个月,并将这些新变量添加到
df
。我现在的
df
看起来像这样:

states<-c("MA","MA","NH","NY")
dates<-c("05/27/21","05/25/21","05/28/21","05/24/21")
price<-c("1.30","1.28","1.40","3")
precipitation<-c("0.5","1.7","1.5","2")
df<-data.frame(states,dates,price,precipitation)

我有

df
中使用的每个州降水量的完整时间序列数据,以将降水量与给定日期相匹配:

state<-c("NH","NH","NH","NH","NH","NY","NY",
         "NY","NY","NY","MA","MA","MA","MA","MA")
date<-c("03/24/2021","03/25/2021","03/26/2021","03/27/2021",
        "03/28/2021","04/24/21","04/25/21","04/26/21","04/27/21",
        "04/28/21")

 precip<-c("0.5","0.2","2","4.5","3","0.7","0.3","1","0.9",
              "2","2","0.3","2.5","2.6","3.1","2.1","3","0.4",
              "0.3","0.7","2.3","2.1","6","3.8","3.6","1","1.3",
              "2.1","3.4","7")
ts<-data.frame(state,date,precip)

我想将

df
中的日期滞后 1、2 和 3 个月,然后回顾
ts
数据集,并将当前日期与前一个月的降水量进行匹配。这可能措辞不好,但对于一个月和两个月的滞后来说,它看起来像这样:

df$precip_1<-c("3.4","1.3","0.7","0.7")
df$precip_2<-c("2.6","0.3","3","2.3")

我尝试过

df$precip_1<- lag(as.xts(df$precipitaion),k=1)
,但实际上并不认为这是正确的,因为它不是时间序列数据。我正在考虑使用滞后日期创建新变量并将其直接匹配到
precip
,然后将其添加到
df
。我以前从未处理过滞后问题,因此我们将不胜感激!

r dataframe
1个回答
0
投票

我会考虑使用 lunridate 包来实现这一点。它允许您以非常强大的方式处理日期。

这是我的尝试:

library(dplyr)
library(lubridate)

dat <- data.frame(Date = c("05/27/21", "05/13/21", "04/27/21", "04/27/21"),
                  State = c("MA", "MA", "MA", "CT"), Price = c(1.3, 1.28, 1, 6),
                  Precipitation = c(0.5, 1.7, 9000, 2))


dat <- dat %>%
  mutate(Date = mdy(Date), #convert to lubridate
         lag = Date %m-% months(1)) %>% #laggs by one month, you can change it to any other number of months or other units too
  left_join(., select(., Price, Date, State), by = join_by(lag == Date, State == State), suffix = c("", "_Lagged")) %>%  #use a left join to match based on the Date - Lagged Date as well as the state, if you need more parameters just add them to join by.
  select(-lag)

dat
        Date State Price Precipitation Price_Lagged
1 2021-05-27    MA  1.30           0.5            1
2 2021-05-13    MA  1.28           1.7           NA
3 2021-04-27    MA  1.00        9000.0           NA
4 2021-04-27    CT  6.00           2.0           NA

首先我们使用

mdy()
函数转换为 Date 对象。然后我们用
%m-%
运算符减去月份。然后我们使用左连接将滞后日期与原始日期进行匹配,确保我们也在查看正确的状态(基本上我们在这里将数据框与其自己的副本合并,可能有一种更干净的方法可以做到这一点)这与 mutate 或其他东西有关,但我不知道如何)。后缀参数仅确定每个数据帧中的列将被调用。这两个
select()
调用只是为了清理,并不是绝对必要的。如果任何给定日期都没有相应的数据
left_join()
将只插入 NA。

© www.soinside.com 2019 - 2024. All rights reserved.