我有 100 家公司的股价数据。时间序列是从 1/1/2010 到 15/3/2023 的每日数据。
由于周末和公共假期,某些天的数据丢失。比如A公司,数据是这样的
data_a <- data.frame(
Date = as.Date(c("2010-03-01", "2010-04-01", "2010-05-01", "2010-06-01", "2010-08-01", "2010-09-01", "2010-11-01")),
Price = c(91, 92, 93, 91, 90, 91, 93),
Company = rep("A", 7)
)
我想平滑数据,这样日期就没有差距。缺少的日期应填写上一个可用日期的值。
生成的数据框应该是:
data <- data.frame(
Date = as.Date(c("2010-01-01", "2010-01-02", "2010-01-03", "2010-01-04", "2010-01-05", "2010-01-06", "2010-01-07", "2010-01-08", "2010-01-09", "2010-01-10", "2010-01-11")),
Price = c(91, 91, 91, 92, 93, 91, 90, 90, 91, 93, 93),
Company = rep("A", 11)
)
我过去没有处理过这样的事情,所以任何帮助将不胜感激。谢谢。
Date
数据框中的data_a
具有ymd
格式,而所需的输出Date
在ydm
中。因此,首先将您的 Date
转换为与输出相同的格式,然后 complete
具有所需日期范围的记录,以及 fill
具有先前值的缺失值。
library(tidyverse)
library(lubridate)
data_a %>%
mutate(Date = ydm(Date)) %>%
complete(Date = seq(ydm("2010-01-01"), ydm("2010-11-01"), 1)) %>%
fill(Price, Company, .direction = "up")
Date Price Company
1 2010-01-01 91 A
2 2010-01-02 91 A
3 2010-01-03 91 A
4 2010-01-04 92 A
5 2010-01-05 93 A
6 2010-01-06 91 A
7 2010-01-07 90 A
8 2010-01-08 90 A
9 2010-01-09 91 A
10 2010-01-10 93 A
11 2010-01-11 93 A
merge
与 seq.Date
和 zoo::na.locf
.
merge(data_a, data.frame(Date=do.call(seq.Date, c(as.list(range(data_a$Date)), 'month'))), all=TRUE) |>
transform(Price=zoo::na.locf(Price), Company=Company[!is.na(Company)][1])
# Date Price Company
# 1 2010-03-01 91 A
# 2 2010-04-01 92 A
# 3 2010-05-01 93 A
# 4 2010-06-01 91 A
# 5 2010-07-01 91 A
# 6 2010-08-01 90 A
# 7 2010-09-01 91 A
# 8 2010-10-01 91 A
# 9 2010-11-01 93 A
资料:
data_a <- structure(list(Date = structure(c(14669, 14700, 14730, 14761,
14822, 14853, 14914), class = "Date"), Price = c(91, 92, 93,
91, 90, 91, 93), Company = c("A", "A", "A", "A", "A", "A", "A"
)), class = "data.frame", row.names = c(NA, -7L))