我想在我的数据框架中添加一列新的按日期(在这种情况下是按季度,所以也可以按月)进行排名。公司应该按其在该季度月份的资产进行排名。
每个季度的公司数量(id)不同,一些新公司可能会进入,一些旧公司可能会消失。
我想从这个
# id assets date
# 1 X1 50 1994-03-31
# 2 X2 120 1994-03-31
# 3 X3 530 1994-03-31
# 4 X4 24 1994-03-31
# 6 X3 57 1994-06-30
# 7 X1 445 1994-06-30
# 8 X10 525 1994-06-30
对此
# id assets date rank
# 1 X1 50 1994-03-31 3
# 2 X2 120 1994-03-31 2
# 3 X3 530 1994-03-31 1
# 4 X4 24 1994-03-31 4
# 6 X3 57 1994-06-30 3
# 7 X1 445 1994-06-30 2
# 8 X10 525 1994-06-30 1
我已经试过了。
temp_asset_rank <- temp_asset_rank %>%
mutate(yearx = year(date)) %>%
mutate(month = month(date)) %>%
group_by(yearx, month) %>%
mutate(ranking = rank(temp_asset_rank$assets, na.last = NA, ties.method = c("average"))) %>%
ungroup()
但结果是:
Error: Column `ranking` must be length 11788 (the group size) or one, not 1188563
如你所见,我的数据集实际上要大得多 而且还包含额外的列。
改变
group_by(yearx, month)
到
group_by(yearx) %>%
group_by(month)
也不行
您能帮助我吗?
基础R的解决方案。
within(df[order(df$assets, decreasing = TRUE),],
{rank <- ave(assets, date, FUN = seq.int)})
Tidyverse解决方案:
library(tidyverse)
df %>%
mutate(idx = row_number()) %>%
arrange(desc(assets)) %>%
group_by(date) %>%
mutate(rank = row_number()) %>%
ungroup() %>%
arrange(idx) %>%
select(-idx)
数据。
df <- structure(list(id = c("X1", "X2", "X3", "X4", "X3", "X1", "X10"),
assets = c(50L, 120L, 530L, 24L, 57L, 445L, 525L),
date = c("1994-03-31", "1994-03-31", "1994-03-31", "1994-03-31", "1994-06-30",
"1994-06-30", "1994-06-30")), class = "data.frame", row.names = c(NA, -7L))