尝试强制新预测器中的观察结果以在医疗补助支出数据集中执行逻辑函数

问题描述 投票:0回答:1

我正在分析 Medicaid Spending by Drug Data Dictionary 数据集的数据。具体来说,我想执行逻辑回归,其中 y 应该是 CAGR_Avg_Spnd_Per_Dsg_Unt_18_22。

不幸的是,根据我的代码,类和模式仍然是字符。

我对“向上”和“向下”方法的灵感来自于以下内容:

# The library comes from Introduction to Statistical Learning: With Applications in R

library(ISLR)
attach(Smarket)
summary(Smarket)
# desired output:
glm.fit=glm(Direction~Lag1 + Lag2 + Lag3 + Lag4 + Lag5 + Volume,
            family=binomial,data=Smarket)
contrasts(Direction)

通过使用

glm.fit,
,我可以执行预测、创建混淆矩阵等等。

但是,检查时

summary(drug.spending)

我的“Up”和“Down”是字符,而ISLR的“Up”和“Down”的作者似乎是按数字计算的。作者从未提供使用数据框的“向上”和“向下”观察来执行此操作的代码!

这是我的代码:

library(dplyr)
library(tidyr)
library(psych)
library(leaps)
set.seed(1)

spending <- read.csv("medicaid_spending_by_drug_data_dictionary.csv")
drug.spending <-  spending %>%
  na.omit(spending) %>%
  filter(Mftr_Name == "Overall") %>%
  arrange(desc(Tot_Mftr)) %>%
  filter(duplicated(Gnrc_Name))
drug.spending <- drug.spending[!duplicated(drug.spending$Gnrc_Name),]
attach(drug.spending)



drug.spending <- drug.spending %>%
  mutate(CAGR_Direction = ifelse(CAGR_Avg_Spnd_Per_Dsg_Unt_18_22 > 0, 'Up', 'Down'))
drug.spending$CAGR_Direction <- factor(drug.spending$CAGR_Direction, levels = c('Down', 'Up')) # Update #1

summary(drug.spending)
contrasts(CAGR_Direction) #gives an error

我使用了不同的强制转换,例如

as.numeric()
as.integer()
。我不太确定我哪里出错了......

r dataframe machine-learning logistic-regression
1个回答
0
投票
drug.spending <- drug.spending %>%
  mutate(CAGR_Direction = ifelse(CAGR_Avg_Spnd_Per_Dsg_Unt_18_22 > 0, 'Up', 'Down'))
drug.spending$CAGR_Direction <- factor(drug.spending$CAGR_Direction, levels = c('Down', 'Up'))

考虑复合年增长率方向

© www.soinside.com 2019 - 2024. All rights reserved.