我如何用百分比向量分布数字向量,对结果取整,并始终获得与R相同的总数?

问题描述 投票:1回答:2

问题摘要

我想将数字向量(“ Sum_By_Group”列)乘以百分比向量(“ Percent”列),以将组的总数分配到每个ID中,将结果四舍五入,最后得到与开始时的总数相同。换句话说,我希望“ Distribution_Post_Round”列与“ Sum_By_Group”列相同。

下面是我遇到的问题的一个示例。在组A中,我将“百分比”乘以“ Sum_By_Group”,并以ID 1中的3,ID 2中的3,ID 5中的1结束,总共为7。“ Sum_By_Group”列和“ Distribution_Post_Round”列相同A组,这就是我想要的。在组B中,我将“百分比”乘以“ Sum_By_Group”,并以ID 8中的1和ID 10中的1结束,总共为2。我希望B组的“ Distribution_Post_Round”列为3。

有没有一种方法,而无需使用循环,子集数据帧,然后将数据帧重新加入在一起?

示例

library(dplyr)
df = data.frame('Group' = c(rep('A', 7), rep('B', 5)),
                  'ID' = c(1:12),
                  'Percent' = c(0.413797750, 0.385366840, 0.014417571, 0.060095668, 0.076399650,
                                0.019672573, 0.030249949, 0.381214519, 0.084121796, 0.438327886,
                                0.010665749, 0.085670050),
                  'Sum_By_Group' = c(rep(7,7), rep(3, 5)))
df$Distribute_By_ID = round(df$Percent * df$Sum_By_Group, 0)
df_round = aggregate(Distribute_By_ID ~ Group, data = df, sum)
names(df_round)[names(df_round) == 'Distribute_By_ID'] = 'Distribution_Post_Round'
df = left_join(df, df_round, by = 'Group')
df
  Group ID    Percent Sum_By_Group Distribute_By_ID Distribution_Post_Round
      A  1 0.41379775            7                3                       7
      A  2 0.38536684            7                3                       7
      A  3 0.01441757            7                0                       7
      A  4 0.06009567            7                0                       7
      A  5 0.07639965            7                1                       7
      A  6 0.01967257            7                0                       7
      A  7 0.03024995            7                0                       7
      B  8 0.38121452            3                1                       2
      B  9 0.08412180            3                0                       2
      B 10 0.43832789            3                1                       2
      B 11 0.01066575            3                0                       2
      B 12 0.08567005            3                0                       2

非常感谢您的帮助。请让我知道是否需要进一步说明。

r dataframe aggregate rounding percentage
2个回答
0
投票

哇,谁知道有人已经写了一个包含解决这个问题的功能的程序包。kudos to that team https://cran.r-project.org/web/packages/sfsmisc/index.html


0
投票
df %>%
  mutate(dividend = floor(Percent*Sum_By_Group),
         remainder= Percent*Sum_By_Group-dividend) %>%
  group_by(Group) %>%
  arrange(desc(remainder),.by_group=TRUE) %>%
  mutate(delivered=sum(dividend),
         rownumber=1:n(),
         lastdelivery=if_else(rownumber<=Sum_By_Group-delivered,1,0),
         Final=dividend+lastdelivery) %>%
  ungroup()
© www.soinside.com 2019 - 2024. All rights reserved.