R中的分组后的排名函数

问题描述 投票:-1回答:1

如何使用R创建排名列?下面是一个例子这就是我所拥有的:


Date       group
12/5/2020    A
12/5/2020    A
11/7/2020    A
11/7/2020    A
11/9/2020    B
11/9/2020    B
10/8/2020    B

这就是我想要的:

Date       group   rank
12/5/2020    A      2
12/5/2020    A      2
11/7/2020    A      1
11/7/2020    A      1
11/9/2020    B      2
11/9/2020    B      2
10/8/2020    B      1
r dataframe rank
1个回答
0
投票

tidyverse

library(dplyr)
group_by(dat, group) %>%
  mutate(rank = as.integer(factor(Date))) %>%
  ungroup()
# # A tibble: 7 x 3
#   Date      group  rank
#   <chr>     <chr> <int>
# 1 12/5/2020 A         2
# 2 12/5/2020 A         2
# 3 11/7/2020 A         1
# 4 11/7/2020 A         1
# 5 11/9/2020 B         2
# 6 11/9/2020 B         2
# 7 10/8/2020 B         1

这依赖于Date列的词典编目排序,对此数据样本是可接受的,但这将失败。更好的方法是将其转换为更合适的排序方式,例如Date对象。

dat %>%
  mutate(Date = as.Date(Date, format = "%m/%d/%Y")) %>%
  group_by(group) %>%
  mutate(rank = as.integer(factor(Date))) %>%
  ungroup()
# # A tibble: 7 x 3
#   Date       group  rank
#   <date>     <chr> <int>
# 1 2020-12-05 A         2
# 2 2020-12-05 A         2
# 3 2020-11-07 A         1
# 4 2020-11-07 A         1
# 5 2020-11-09 B         2
# 6 2020-11-09 B         2
# 7 2020-10-08 B         1

这为我们提供了更好的排名功能,dense_rank(@ akrun首先在其中给出了答案……老实说,我是在构建它):

dat %>%
  mutate(Date = as.Date(Date, format = "%m/%d/%Y")) %>%
  group_by(group) %>%
  mutate(rank = dense_rank(Date)) %>%
  ungroup()
# # A tibble: 7 x 3
#   Date       group  rank
#   <date>     <chr> <int>
# 1 2020-12-05 A         2
# 2 2020-12-05 A         2
# 3 2020-11-07 A         1
# 4 2020-11-07 A         1
# 5 2020-11-09 B         2
# 6 2020-11-09 B         2
# 7 2020-10-08 B         1

0
投票

[使用tidyversedense_rank的替代方案:

library(tidyverse)

# Ensure Date is a Date object
df$Date <- as.Date(df$Date, format = "%m/%d/%Y")

df %>%
  group_by(group) %>%
  arrange(Date) %>%
  mutate(rank = dense_rank(Date))

输出

# A tibble: 7 x 3
# Groups:   group [2]
  Date       group  rank
  <date>     <chr> <int>
1 2020-10-08 B         1
2 2020-11-07 A         1
3 2020-11-07 A         1
4 2020-11-09 B         2
5 2020-11-09 B         2
6 2020-12-05 A         2
7 2020-12-05 A         2

0
投票

我们将'Date'转换为dense_rank类后可以使用Date

library(dplyr)
library(lubridate)
df1 %>% 
      group_by(group) %>% 
      mutate(rank = dense_rank(mdy(Date)))
# A tibble: 7 x 3
# Groups:   group [2]
#  Date      group  rank
#  <chr>     <chr> <int>
#1 12/5/2020 A         2
#2 12/5/2020 A         2
#3 11/7/2020 A         1
#4 11/7/2020 A         1
#5 11/9/2020 B         2
#6 11/9/2020 B         2
#7 10/8/2020 B         1
© www.soinside.com 2019 - 2024. All rights reserved.