用之前日期的值填充缺失的日期值

问题描述 投票:0回答:2

我有 100 家公司的股价数据。时间序列是从 1/1/2010 到 15/3/2023 的每日数据。

由于周末和公共假期,某些天的数据丢失。比如A公司,数据是这样的

data_a <- data.frame(
  Date = as.Date(c("2010-03-01", "2010-04-01", "2010-05-01", "2010-06-01", "2010-08-01", "2010-09-01", "2010-11-01")),
  Price = c(91, 92, 93, 91, 90, 91, 93),
  Company = rep("A", 7)
)

我想平滑数据,这样日期就没有差距。缺少的日期应填写上一个可用日期的值。

生成的数据框应该是:

data <- data.frame(
  Date = as.Date(c("2010-01-01", "2010-01-02", "2010-01-03", "2010-01-04", "2010-01-05", "2010-01-06", "2010-01-07", "2010-01-08", "2010-01-09", "2010-01-10", "2010-01-11")),
  Price = c(91, 91, 91, 92, 93, 91, 90, 90, 91, 93, 93),
  Company = rep("A", 11)
)

我过去没有处理过这样的事情,所以任何帮助将不胜感激。谢谢。

r dataframe date lubridate
2个回答
0
投票

Date
数据框中的
data_a
具有
ymd
格式,而所需的输出
Date
ydm
中。因此,首先将您的
Date
转换为与输出相同的格式,然后
complete
具有所需日期范围的记录,以及
fill
具有先前值的缺失值。

library(tidyverse)
library(lubridate)

data_a %>% 
  mutate(Date = ydm(Date)) %>% 
  complete(Date = seq(ydm("2010-01-01"), ydm("2010-11-01"), 1)) %>% 
  fill(Price, Company, .direction = "up")

         Date Price Company
1  2010-01-01    91       A
2  2010-01-02    91       A
3  2010-01-03    91       A
4  2010-01-04    92       A
5  2010-01-05    93       A
6  2010-01-06    91       A
7  2010-01-07    90       A
8  2010-01-08    90       A
9  2010-01-09    91       A
10 2010-01-10    93       A
11 2010-01-11    93       A

0
投票

merge
seq.Date
zoo::na.locf
.

merge(data_a, data.frame(Date=do.call(seq.Date, c(as.list(range(data_a$Date)), 'month'))), all=TRUE) |>
  transform(Price=zoo::na.locf(Price), Company=Company[!is.na(Company)][1])
#         Date Price Company
# 1 2010-03-01    91       A
# 2 2010-04-01    92       A
# 3 2010-05-01    93       A
# 4 2010-06-01    91       A
# 5 2010-07-01    91       A
# 6 2010-08-01    90       A
# 7 2010-09-01    91       A
# 8 2010-10-01    91       A
# 9 2010-11-01    93       A

资料:

data_a <- structure(list(Date = structure(c(14669, 14700, 14730, 14761, 
14822, 14853, 14914), class = "Date"), Price = c(91, 92, 93, 
91, 90, 91, 93), Company = c("A", "A", "A", "A", "A", "A", "A"
)), class = "data.frame", row.names = c(NA, -7L))
© www.soinside.com 2019 - 2024. All rights reserved.