R 工作室帮助。我正在尝试使用马尔可夫链和 R Studio 修复预测模型。我正在使用一个名为“veteranbenefits”的数据集,其中有
我很难编写代码来预测哪个州最有可能比其他州更有机会使用退伍军人权利法案福利。
我收到此错误
“暗名错误 (x) <- dn :length of 'dimnames' [2] not equal to array extent."
这是代码和数据集的预览。
library(dplyr)
library(markovchain)
veteranbenefits <- read.csv("veteranbenefits.csv")
veteranbenefits <- na.omit(veteranbenefits)
transition_matrix <- veteranbenefits %>%
group_by(State.Name) %>%
summarize(transition_prob = sum(Percent.of.Veterans.who.used.GI.Bill.Education.Benefits) / sum(Percent.of.Veterans.who.used.GI.Bill.Education.Benefits))
markov_model <- new("markovchain", states = transition_matrix$State.Name, transitionMatrix = matrix(as.numeric(transition_matrix$transition_prob), nrow = 1))
predicted_probabilities <- steadyStates(markov_model)
highest_state <- names(predicted_probabilities)[which.max(predicted_probabilities)]
print(highest_state)
我尝试建立一个马尔可夫链来预测哪个州退伍军人使用退伍军人法案的可能性更高。
在您的汇总函数中,transition_prob 的计算没有意义,因为它实际上只是将百分比相加并除以自身,这总是为每个状态提供 1。对于马尔可夫链模型没有用。
马尔可夫模型的转移矩阵应该是一个方阵,其中行数和列数都等于状态数(在您的情况下,是唯一 State.Name 的数量)。此外,矩阵中的每一行总和应为 1,表示可能的下一个状态的总概率分布。
试试这个:
library(dplyr)
library(markovchain)
veteranbenefits <- read.csv("veteranbenefits.csv")
veteranbenefits <- na.omit(veteranbenefits)
state_year_totals <- veteranbenefits %>%
group_by(State.Name, Year) %>%
summarize(Total.Used.GI = sum(Percent.of.Veterans.who.used.GI.Bill.Education.Benefits * Veteran.population / 100), .groups = 'drop')
transition_matrix <- state_year_totals %>%
arrange(State.Name, Year) %>%
group_by(State.Name) %>%
do({
data <- summarise(., shift_val = lag(Total.Used.GI, default = first(Total.Used.GI)))
return(data.frame(Transition = .[["Total.Used.GI"]] / data[["shift_val"]]))
}) %>%
spread(Year, Transition, fill = 1)
transition_matrix[,-1] <- sweep(transition_matrix[,-1], 1, rowSums(transition_matrix[,-1]), `/`)
# Creating Markov chain model
states <- unique(veteranbenefits$State.Name)
markov_model <- new("markovchain", states = states, transitionMatrix = as.matrix(transition_matrix[,-1]))
predicted_probabilities <- steadyStates(markov_model)
highest_state <- states[which.max(predicted_probabilities)]
print(highest_state)