以下代码运行非常简单的lm()
,并尝试在一个小的数据帧中总结结果(因子水平,系数):
df <- data.frame(star_sign = c("Aries", "Taurus", "Gemini", "Cancer", "Leo", "Virgo", "Libra", "Scorpio", "Sagittarius", "Capricorn", "Aquarius", "Pisces"),
y = c(1.1, 1.2, 1.4, 1.3, 1.8, 1.6, 1.4, 1.3, 1.2, 1.1, 1.5, 1.3))
levels(df$star_sign) #alphabetical order
# fit a simple linear model
my_lm <- lm(y ~ 1 + star_sign, data = df)
summary(my_lm) # intercept is based on first level of factor, aquarius
# I want the levels to work properly 1..12 = Aries, Taurus...Pisces so I'm going to redefine the factor levels
df$my_levels <- c("Aries", "Taurus", "Gemini", "Cancer", "Leo", "Virgo", "Libra", "Scorpio", "Sagittarius", "Capricorn", "Aquarius", "Pisces")
df$star_sign <- factor(df$star_sign, levels = df$my_levels)
my_lm <- lm(y ~ 1 + star_sign_, data = df)
summary(my_lm) # intercept is based on first level of factor which is now Aries
# but for my model fit I want the reference level to be Virgo (because reasons)
df$star_sign_2 <- relevel(df$star_sign, ref = "Virgo")
my_lm <- lm(y ~ 1 + star_sign_2, data = df)
summary(my_lm)
df_results <- data.frame(factor_level = names(my_lm$coefficients), coeff = my_lm$coefficients )
# tidy up
rownames(df_results) <- 1:12
df_results$factor_level <- as.factor(gsub("star_sign_2", "", df_results$factor_level))
# change label of "(Intercept)" to "Virgo"
df_results$factor_level <- plyr::revalue(df_results$factor_level, c("(Intercept)" = "Virgo"))
levels(df_results$factor_level) # the levels are alphabetical + Virgo at the front (not same as display order from lm)
因子级别不正确:我想对df_results
进行排序,以使星号以与它们最初的顺序(白羊座,金牛座...双鱼座)相同的顺序出现,如[C0 ]列。我认为我对操纵因子及其标签/水平等没有很好的了解,因此我很努力地知道如何做到这一点。
而且这是一段冗长而笨拙的代码。是否有更简洁的方法来执行此类操作?
谢谢。
((从数学上讲,该模型显然是微不足道的,但是对于这些目的来说是可以的-我只是对如何操纵输出感兴趣)
这是使用df$my_levels
包(和broom
)进行模型系数提取的方法:
dplyr
设置library(broom)
library(dplyr)
broom::tidy(my_lm) %>%
mutate(term = sub("star_sign_2", "", term),
term = ifelse(term == "(Intercept)", "Virgo", term),
term = factor(term, levels = unique(term)))
# A tibble: 12 x 5
term estimate std.error statistic p.value
<fct> <dbl> <dbl> <dbl> <dbl>
1 Virgo 1.6 NaN NaN NaN
2 Aries -0.500 NaN NaN NaN
3 Taurus -0.4 NaN NaN NaN
4 Gemini -0.2 NaN NaN NaN
5 Cancer -0.300 NaN NaN NaN
6 Leo 0.20 NaN NaN NaN
7 Libra -0.2 NaN NaN NaN
8 Scorpio -0.3 NaN NaN NaN
9 Sagittarius -0.4 NaN NaN NaN
10 Capricorn -0.500 NaN NaN NaN
11 Aquarius -0.1 NaN NaN NaN
12 Pisces -0.300 NaN NaN NaN
是将级别按出现顺序排列的好方法。
我的另一建议是在数据框中按所需顺序保留级别的向量,然后在需要建立顺序时引用它。例如,levels = unique(term)
因此您可以将上面的最后一步替换为astro_order = c("Aries", "Taurus", "Gemini", "Cancer", "Leo", "Virgo", "Libra", "Scorpio", "Sagittarius", "Capricorn", "Aquarius", "Pisces")
# messy but effective:
astro_order_virgo1 = factor(astro_order, levels = astro_order) %>%
relevel("Virgo") %>%
levels()
。
这种将级别顺序分开的方法很好,因为(a)如果您对数据框重新排序,它不会改变;(b)如果您的数据框很长,并且您有重复的输入项,它也一样有效因素水平。