我想将数字向量(“ Sum_By_Group”列)乘以百分比向量(“ Percent”列),以将组的总数分配到每个ID中,将结果四舍五入,最后得到与开始时的总数相同。换句话说,我希望“ Distribution_Post_Round”列与“ Sum_By_Group”列相同。
下面是我遇到的问题的一个示例。在组A中,我将“百分比”乘以“ Sum_By_Group”,并以ID 1中的3,ID 2中的3,ID 5中的1结束,总共为7。“ Sum_By_Group”列和“ Distribution_Post_Round”列相同A组,这就是我想要的。在组B中,我将“百分比”乘以“ Sum_By_Group”,并以ID 8中的1和ID 10中的1结束,总共为2。我希望B组的“ Distribution_Post_Round”列为3。
有没有一种方法,而无需使用循环,子集数据帧,然后将数据帧重新加入在一起?
library(dplyr)
df = data.frame('Group' = c(rep('A', 7), rep('B', 5)),
'ID' = c(1:12),
'Percent' = c(0.413797750, 0.385366840, 0.014417571, 0.060095668, 0.076399650,
0.019672573, 0.030249949, 0.381214519, 0.084121796, 0.438327886,
0.010665749, 0.085670050),
'Sum_By_Group' = c(rep(7,7), rep(3, 5)))
df$Distribute_By_ID = round(df$Percent * df$Sum_By_Group, 0)
df_round = aggregate(Distribute_By_ID ~ Group, data = df, sum)
names(df_round)[names(df_round) == 'Distribute_By_ID'] = 'Distribution_Post_Round'
df = left_join(df, df_round, by = 'Group')
df
Group ID Percent Sum_By_Group Distribute_By_ID Distribution_Post_Round
A 1 0.41379775 7 3 7
A 2 0.38536684 7 3 7
A 3 0.01441757 7 0 7
A 4 0.06009567 7 0 7
A 5 0.07639965 7 1 7
A 6 0.01967257 7 0 7
A 7 0.03024995 7 0 7
B 8 0.38121452 3 1 2
B 9 0.08412180 3 0 2
B 10 0.43832789 3 1 2
B 11 0.01066575 3 0 2
B 12 0.08567005 3 0 2
非常感谢您的帮助。请让我知道是否需要进一步说明。
哇,谁知道有人已经写了一个包含解决这个问题的功能的程序包。kudos to that team https://cran.r-project.org/web/packages/sfsmisc/index.html
df %>%
mutate(dividend = floor(Percent*Sum_By_Group),
remainder= Percent*Sum_By_Group-dividend) %>%
group_by(Group) %>%
arrange(desc(remainder),.by_group=TRUE) %>%
mutate(delivered=sum(dividend),
rownumber=1:n(),
lastdelivery=if_else(rownumber<=Sum_By_Group-delivered,1,0),
Final=dividend+lastdelivery) %>%
ungroup()