如何在 R 中使用 hmmTMB 对观测状态进行一步预测?

问题描述 投票:0回答:1

我一直在使用 R 中的 hmmTMB 包来使用隐马尔可夫模型 (HMM) 来建模和预测油价走势。在用训练数据拟合模型后,我正在努力使用测试数据集上的拟合模型对观察到的状态(例如石油价格)进行一步预测。

# Load necessary libraries
library(hmmTMB)
library(tidyverse)

# Set seed for reproducibility
set.seed(123)

# Example dataset preparation (extended for 1 year)
df <- data.frame(
    day = seq.Date(from = as.Date("2020-01-01"), to = as.Date("2020-12-31"), by = "day"),
    OILPRICE = rnorm(366, 100, 10)
    ) %>%
    as_tibble()

# Splitting into training and test sets
split_index <- round(nrow(df) * 0.8)
training_set <- df[1:split_index, ] %>% as.data.frame()
testing_set <- df[(split_index + 1):nrow(df), ] %>% as.data.frame()

# Define a hidden Markov model with 2 states for the training set
hid1 <- MarkovChain$new(data = training_set, n_states = 2)
dists <- list(OILPRICE = "norm")
par0 <- list(OILPRICE = list(mean = c(110, 90), sd = c(1, 1)))
obs1 <- Observation$new(data = training_set, n_states = 2, dists = dists, par = par0)
par0 <- obs1$suggest_initial()
obs1 <- Observation$new(data = training_set, n_states = 2, dists = dists, par = par0)
hmm1 <- HMM$new(obs = obs1, hid = hid1)
hmm1$fit(silent = TRUE)

# Rename
data_plot <- training_set

# Color by most likely state sequence
data_plot$state <- factor(paste0("State ", hmm1$viterbi()))
ggplot(data_plot, aes(day, OILPRICE, col = state)) +
    geom_point() +
    scale_color_manual(values = pal, name = NULL)

# Attempting to make a one-step-ahead prediction
# ??? This is where I need guidance

enter image description here

如何使用 hmmTMB 在我的测试数据集 (testing_set) 上使用拟合模型 (hmm1) 对观察到的状态(例如第二天的 OILPRICE)进行一步预测?我正在寻找一种基于适合训练集的模型来预测未来油价值的方法。

我很欣赏有关如何使用 hmmTMB 包实现此目标的任何见解或示例。

r time-series prediction hidden-markov-models
1个回答
0
投票

hmmTMB 目前没有内置预测功能(截至 2024 年 2 月)。然而,可以根据拟合的 hmmTMB 模型计算预测分布。

预测分布是状态相关分布的混合,其中权重对应于处于不同状态的概率。因此,您可以使用以下工作流程:

  1. 获取最后观察到的数据行(即,在时间 n)的状态分布;我们称其为
    u
  2. 得到n+h时刻的状态分布为
    u %*% (tpm %^% h)
    ,其中
    tpm
    是状态过程的(一步)转移概率矩阵;也就是说,我们将
    u
    乘以 h 步转移概率矩阵
  3. 将预测分布作为估计的状态相关分布的混合,并按上一步中找到的状态概率进行加权

在步骤 3 中,您可以计算预测平均值(作为状态相关平均值的加权和),或预测分布的概率密度函数,如下面的代码所示。

预测时间平均值

黑点显示最后一次观察训练集后 60 天的预测分布平均值。

预测分布

线条显示了最后一次观察训练集后 60 天的预测分布的概率密度函数。正如你所看到的,它很可能一开始处于状态2,因此相应的状态相关分布具有更高的权重。一段时间后,随着状态分布接近平稳(长期)分布,预测分布停止变化。

代码

# Load packages
library(scico)
library(expm)

# Grid of oil prices to plot forecast distributions
grid <- seq(min(training_set$OILPRICE), 
            max(training_set$OILPRICE), 
            length = 100)

# Get state distribution at last (training) observation
sp <- hmm1$state_probs()
u <- sp[nrow(sp),]

# Get model parameters
tpm <- hid1$tpm()[,,1]
obspar <- obs1$par()[,,1]

# Loop over forecast times (over 60 days here)
h <- 1:60
mix <- list()
mix_mean <- rep(NA, length = length(h))
for(i in seq_along(h)) {
    # State distribution at time n + h
    dist_h <- u %*% (tpm %^% h[i])
    
    # Forecast distribution at time n + h
    mix[[i]] <- 
        dist_h[1] * dnorm(x = grid, 
                          mean = obspar["OILPRICE.mean", "state 1"], 
                          sd = obspar["OILPRICE.sd", "state 1"]) +
        dist_h[2] * dnorm(x = grid, 
                          mean = obspar["OILPRICE.mean", "state 2"], 
                          sd = obspar["OILPRICE.sd", "state 2"])
    
    # Mean of forecast distribution
    mix_mean[i] <- sum(dist_h * obspar["OILPRICE.mean",])
}

# Plot the forecast mean against time
df_mean <- data.frame(day = seq.Date(from = as.Date("2020-10-20"), 
                                     by = "day", length = length(h)),
                      mean = mix_mean)

ggplot(data_plot, aes(day, OILPRICE, col = state)) +
    geom_point() +
    geom_point(aes(y = mean, col = NULL), data = df_mean) +
    scale_color_manual(values = pal)

# Plot the forecast distributions for h = 1, 2, ..., 60
df_mix <- data.frame(h = rep(h, each = 100),
                 price = grid,
                 mix = unlist(mix))

ggplot(df_mix, aes(price, mix, col = h, group = h)) +
    geom_line() +
    scale_color_scico(name = "time") +
    labs(y = "forecast distribution")
© www.soinside.com 2019 - 2024. All rights reserved.