我汇集了不同时间点不同状态的横截面数据。我试图了解农作物生长期间的天气事件如何影响农作物目前的定价。为此,我想将
precipitation
变量滞后 1 个月、2 个月和 3 个月,并将这些新变量添加到 df
。我现在的df
看起来像这样:
states<-c("MA","MA","NH","NY")
dates<-c("05/27/21","05/25/21","05/28/21","05/24/21")
price<-c("1.30","1.28","1.40","3")
precipitation<-c("0.5","1.7","1.5","2")
df<-data.frame(states,dates,price,precipitation)
我有
df
中使用的每个州降水量的完整时间序列数据,以将降水量与给定日期相匹配:
state<-c("NH","NH","NH","NH","NH","NY","NY",
"NY","NY","NY","MA","MA","MA","MA","MA")
date<-c("03/24/2021","03/25/2021","03/26/2021","03/27/2021",
"03/28/2021","04/24/21","04/25/21","04/26/21","04/27/21",
"04/28/21")
precip<-c("0.5","0.2","2","4.5","3","0.7","0.3","1","0.9",
"2","2","0.3","2.5","2.6","3.1","2.1","3","0.4",
"0.3","0.7","2.3","2.1","6","3.8","3.6","1","1.3",
"2.1","3.4","7")
ts<-data.frame(state,date,precip)
我想将
df
中的日期滞后 1、2 和 3 个月,然后回顾 ts
数据集,并将当前日期与前一个月的降水量进行匹配。这可能措辞不好,但对于一个月和两个月的滞后来说,它看起来像这样:
df$precip_1<-c("3.4","1.3","0.7","0.7")
df$precip_2<-c("2.6","0.3","3","2.3")
我尝试过
df$precip_1<- lag(as.xts(df$precipitaion),k=1)
,但实际上并不认为这是正确的,因为它不是时间序列数据。我正在考虑使用滞后日期创建新变量并将其直接匹配到 precip
,然后将其添加到 df
。我以前从未处理过滞后问题,因此我们将不胜感激!
我会考虑使用 lunridate 包来实现这一点。它允许您以非常强大的方式处理日期。
这是我的尝试:
library(dplyr)
library(lubridate)
dat <- data.frame(Date = c("05/27/21", "05/13/21", "04/27/21", "04/27/21"),
State = c("MA", "MA", "MA", "CT"), Price = c(1.3, 1.28, 1, 6),
Precipitation = c(0.5, 1.7, 9000, 2))
dat <- dat %>%
mutate(Date = mdy(Date), #convert to lubridate
lag = Date %m-% months(1)) %>% #laggs by one month, you can change it to any other number of months or other units too
left_join(., select(., Price, Date, State), by = join_by(lag == Date, State == State), suffix = c("", "_Lagged")) %>% #use a left join to match based on the Date - Lagged Date as well as the state, if you need more parameters just add them to join by.
select(-lag)
dat
Date State Price Precipitation Price_Lagged
1 2021-05-27 MA 1.30 0.5 1
2 2021-05-13 MA 1.28 1.7 NA
3 2021-04-27 MA 1.00 9000.0 NA
4 2021-04-27 CT 6.00 2.0 NA
首先我们使用
mdy()
函数转换为 Date 对象。然后我们用 %m-%
运算符减去月份。然后我们使用左连接将滞后日期与原始日期进行匹配,确保我们也在查看正确的状态(基本上我们在这里将数据框与其自己的副本合并,可能有一种更干净的方法可以做到这一点)这与 mutate 或其他东西有关,但我不知道如何)。后缀参数仅确定每个数据帧中的列将被调用。这两个 select()
调用只是为了清理,并不是绝对必要的。如果任何给定日期都没有相应的数据left_join()
将只插入 NA。