ACCOUNT表包含客户持有的帐户列表。客户可能拥有每种类型的多个帐户。该表的布局如下:
CUSTOMER_NUMBER CUSTOMER_AGE ACCOUNT_NUMBER ACCOUNT TYPE
123 27 A987 Home Loan
123 27 B6547 Credit Card
124 42 B7531 Credit Card
显示如何确定以下每个年龄段的信用卡客户比例:18 - 29,30 - 44,45 - 59,60 +。
输出应该如下所示:
Age Band % with Credit Card
18-29 44.9%
30-44 41.2%
45-59 45.5%
60+ 43.0%
如何获得上表所示的预期结果,不同年龄组的百分比,即持有信用卡的人。
请帮我R代码
一个dplyr
可能是:
df %>%
group_by(grp = cut(CUSTOMER_AGE,
breaks = c(18, 29, 44, 59, Inf),
labels = c("18-29", "30-44", "45-59", "60+"),
right = FALSE)) %>%
summarise(res = (length(ACCOUNT_TYPE[ACCOUNT_TYPE == "Credit_Card"])/n()*100))
grp res
<fct> <dbl>
1 18-29 50
2 30-44 100
样本数据:
df <- read.table(text = "CUSTOMER_NUMBER CUSTOMER_AGE ACCOUNT_NUMBER ACCOUNT_TYPE
123 27 A987 Home_Loan
123 27 B6547 Credit_Card
124 42 B7531 Credit_Card", header = TRUE,
stringsAsFactors = FALSE)