函数na.rv(T),na.omit,is.finite等不适用于列的均值

问题描述 投票:1回答:1

我正在尝试计算一个大df的平均值,将观察值除以Id和月份,而我发现的答案都没有达到预期的效果,有时它们会清空我的样本而且没用。

如果df是:

permno               company        amihud   illiq  MonthYr
10026   J & J SNACK FOODS CORP  1.389026403 1.625   1990-01
10026   J & J SNACK FOODS CORP  1.028968686 NA      1990-01
10026   J & J SNACK FOODS CORP  NA          NA      1990-01
10026   J & J SNACK FOODS CORP  NA          NA      1990-01
10026   J & J SNACK FOODS CORP  Inf         NA      1990-01
10026   J & J SNACK FOODS CORP  Inf         NA      1990-02
10026   J & J SNACK FOODS CORP  0.891034483 NA      1990-02
10397   WERNER ENTERPRISES INC  0.443933917 NA      1990-01
10397   WERNER ENTERPRISES INC  0.255496848 NA      1990-01
10397   WERNER ENTERPRISES INC  0.891034483 NA      1990-02

structure(list(permno = c(10026L, 10026L, 10026L, 10026L, 10026L, 
10026L, 10397L, 10397L, 10397L, 10397L), date = structure(c(5L, 
6L, 1L, 2L, 3L, 4L, 7L, 8L, 9L, 10L), .Label = c("1/10/1990", 
"1/11/1990", "1/12/1990", "1/15/1990", "1/2/1990", "1/3/1990", 
"7/29/1998", "7/30/1998", "8/6/1998", "8/7/1998"), class = "factor"), 
    company = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
    2L), .Label = c("J & J SNACK FOODS CORP", "WERNER ENTERPRISES INC"
    ), class = "factor"), price = c(11.75, 12.75, 13, 13, 12.375, 
    12.75, 12.25, 12.25, 10.75, 11.25), volume = c(36360L, 82710L, 
    22750L, 8574L, 40262L, 10150L, 25200L, 9000L, 333100L, 52200L
    ), amihud = c(1.389026403, 1.028968686, NA, Inf, Inf, 0.891034483, 
    0.255496848, NA, Inf, 0.891034483), illiq = c(1.625240831, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA), MonthYr = structure(c(1L, 
    1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("1990-01", 
    "1990-02"), class = "factor")), .Names = c("permno", "date", 
"company", "price", "volume", "amihud", "illiq", "MonthYr"), class = "data.frame", row.names = c(NA, 
-10L))

我想计算Amihud指标(衡量金融中的非流动性,因此衡量风险)。简而言之:我需要每个股票的平均值(permno)和每月变量'amihud',我将其称为'illiq'。

我试过了:

res <- smallcap %>%
        group_by(permno, MonthYr) %>%
        mean(amihud, na.rm=T) %>% 
        group_by(permno) 

我不知道在多大程度上这是正确的,但是每次省略或分配NA和Inf的尝试都没有成功。

预期结果,无论此示例的正确性如何,并且不需要amihud变量:

permno               company    illiq   MonthYr
    10026   J & J SNACK FOODS CORP  1.65    1990-01
    10026   J & J SNACK FOODS CORP  0.87    1990-02
    10397   WERNER ENTERPRISES INC  0.25    1990-01
    10397   WERNER ENTERPRISES INC  0.55    1990-02

我感谢您提供的任何提示。

r mean na
1个回答
1
投票

您需要执行以下操作:

#since you don't care about the Infs convert them to NAs
#so that they get removed at the mean function 
#since we have set na.rm=TRUE
df$amihud[df$amihud==Inf] <- NA

library(dplyr)
#you need to use summarise to calculate the means as below:
res <- df %>%
          select(permno, company, MonthYr, amihud) %>%
          group_by(permno, company, MonthYr) %>%
          summarise(illiq = mean(amihud, na.rm=TRUE))

输出:

> res
Source: local data frame [4 x 4]
Groups: permno, company

  permno                company MonthYr     illiq
1  10026 J & J SNACK FOODS CORP 1990-01 1.2089975
2  10026 J & J SNACK FOODS CORP 1990-02 0.8910345
3  10397 WERNER ENTERPRISES INC 1990-01 0.2554968
4  10397 WERNER ENTERPRISES INC 1990-02 0.8910345

附:您的预期输出中的值可能来自完整集,因为10026 J & J SNACK FOODS CORP 1990-02只有一个值,也应该是输出中的平均值,即0.8910345而不是0.87

© www.soinside.com 2019 - 2024. All rights reserved.