如何在 R 中根据特定列是否标记有一个或多个 ID 来减去值?

问题描述 投票:0回答:0

注意:这与我之前在这里问过的一个问题有些相关

这里是我的数据的一个子集,例如:

library(dplyr)

DF <- structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
13, 14, 15, 16, 17, 18, 19), day = c("day1", "day2", "day3", 
"day4", "day5", "day6", "day6", "day7", "day8", "day9", "day10", 
"day10", "day11", "day12", "day13", "day14", "day14", "day14", 
"day14"), sent_to = c(NA, NA, "Blue Superstore", "Garden Cinema", 
"Pasta House", NA, NA, "Pizzaria", NA, "Ice Palace", NA, NA, 
"Shoes Centre", "Dreams Dessert", NA, "Chicken World", "Art Gallery", 
"Smoothie Hut", NA), received_from = c("ATM", "Sarah", NA, NA, 
NA, "Jane", "Joe", NA, "Sarah", NA, "Anna", "Jane", NA, NA, "Anna", 
NA, NA, NA, "Joe"), reference = c("add_cash", "gift", "shopping", 
"cinema_tickets", "meal", "reimbursed", "reimbursed", "meal", 
"reimbursed", "ice_rink_tickets", "reimbursed", "reimbursed", 
"shoes", "ice_cream", "reimbursed", "meal", "gallery_ticket", 
"drink", "reimbursed"), decrease = c(0, 0, 15.2, 10.8, 12.5, 
0, 0, 10, 0, 18, 0, 0, 15, 6.5, 0, 8, 3.5, 2, 0), increase = c(50, 
30, 0, 0, 0, 5.4, 7.25, 0, 10, 0, 6, 6, 0, 0, 21.5, 0, 0, 0, 
13.5), reimbursed_id = c(NA, NA, NA, "R", "R", "4", "5", "R", 
"8", "R", "10", "10", "R", "R", "13, 14", "R", "R", "R", "16, 17, 18"
), change = c(50, 30, -15.2, -10.8, -12.5, 5.4, 7.25, -10, 10, 
-18, 6, 6, -15, -6.5, 21.5, -8, -3.5, -2, 13.5), balance = c(50, 
80, 64.8, 54, 41.5, 46.9, 54.15, 44.15, 54.15, 36.15, 42.15, 
48.15, 33.15, 26.65, 48.15, 40.15, 36.65, 34.65, 48.15)), row.names = c(NA, 
-19L), class = c("tbl_df", "tbl", "data.frame"))
> DF
# A tibble: 19 × 10
      id day   sent_to         received_from reference        decrease increase reimbursed_id change balance
   <dbl> <chr> <chr>           <chr>         <chr>               <dbl>    <dbl> <chr>          <dbl>   <dbl>
 1     1 day1  NA              ATM           add_cash              0      50    NA             50       50  
 2     2 day2  NA              Sarah         gift                  0      30    NA             30       80  
 3     3 day3  Blue Superstore NA            shopping             15.2     0    NA            -15.2     64.8
 4     4 day4  Garden Cinema   NA            cinema_tickets       10.8     0    R             -10.8     54  
 5     5 day5  Pasta House     NA            meal                 12.5     0    R             -12.5     41.5
 6     6 day6  NA              Jane          reimbursed            0       5.4  4               5.4     46.9
 7     7 day6  NA              Joe           reimbursed            0       7.25 5               7.25    54.2
 8     8 day7  Pizzaria        NA            meal                 10       0    R             -10       44.2
 9     9 day8  NA              Sarah         reimbursed            0      10    8              10       54.2
10    10 day9  Ice Palace      NA            ice_rink_tickets     18       0    R             -18       36.2
11    11 day10 NA              Anna          reimbursed            0       6    10              6       42.2
12    12 day10 NA              Jane          reimbursed            0       6    10              6       48.2
13    13 day11 Shoes Centre    NA            shoes                15       0    R             -15       33.2
14    14 day12 Dreams Dessert  NA            ice_cream             6.5     0    R              -6.5     26.6
15    15 day13 NA              Anna          reimbursed            0      21.5  13, 14         21.5     48.2
16    16 day14 Chicken World   NA            meal                  8       0    R              -8       40.2
17    17 day14 Art Gallery     NA            gallery_ticket        3.5     0    R              -3.5     36.6
18    18 day14 Smoothie Hut    NA            drink                 2       0    R              -2       34.6
19    19 day14 NA              Joe           reimbursed            0      13.5  16, 17, 18     13.5     48.2

reimbursed_id
栏的解释:

  • R
    表示
    decrease
    列中的值不代表用户的实际支出,因为它包括代表某人支付的金额
  • 4
    (或任何数字)表示用户被报销的id(归还借入的金额)
  • 13, 14
    (或逗号分隔的数字列表)代表用户报销的 id,但跨越多个交易

期望的结果:

我想在这个数据集中添加一个

actual_decrease
列,它基本上查看
reimbursed_id
列,记录影响其他行的ID,在
increase
列中收集所述行的报销金额,并从
decrease
中相应 ID 的值。

更多详情:

请参考下图(包含我希望

actual_decrease
列看起来像的东西):

如您在屏幕截图中所见,根据

reimbursed_id
列的内容,有几种不同类型的计算已应用于每一行。

如果标记为“R”,则

actual_decrease
的计算将取决于报销是否用于:

  • 总金额(一个ID)
  • 总金额的一部分(一个ID)
  • 总金额,但通过多次交易(多个ID)
  • 总金额的一部分,但是通过多次交易(多个ID)

如果没有“R”标记,那么

actual_decrease
的计算将只是
decrease
中的值。


到目前为止,我只有以下内容(基于我之前提出的一个问题):

DF %>%
  left_join(DF %>%
              filter(reference == "reimbursed") %>%
              group_by(id = as.numeric(reimbursed_id)) %>% # removes row 15 and 19 (contains comma-separated values)
              summarise(actual_decrease = sum(increase)),
            by = "id") %>%
  mutate(actual_decrease = ifelse(!is.na(actual_decrease),
                                  decrease - actual_decrease,
                                  decrease))

# A tibble: 19 × 11
      id day   sent_to         received_from reference        decrease increase reimbursed_id change balance actual_decrease
   <dbl> <chr> <chr>           <chr>         <chr>               <dbl>    <dbl> <chr>          <dbl>   <dbl>           <dbl>
 1     1 day1  NA              ATM           add_cash              0      50    NA             50       50              0   
 2     2 day2  NA              Sarah         gift                  0      30    NA             30       80              0   
 3     3 day3  Blue Superstore NA            shopping             15.2     0    NA            -15.2     64.8           15.2 
 4     4 day4  Garden Cinema   NA            cinema_tickets       10.8     0    R             -10.8     54              5.4 
 5     5 day5  Pasta House     NA            meal                 12.5     0    R             -12.5     41.5            5.25
 6     6 day6  NA              Jane          reimbursed            0       5.4  4               5.4     46.9            0   
 7     7 day6  NA              Joe           reimbursed            0       7.25 5               7.25    54.2            0   
 8     8 day7  Pizzaria        NA            meal                 10       0    R             -10       44.2            0   
 9     9 day8  NA              Sarah         reimbursed            0      10    8              10       54.2            0   
10    10 day9  Ice Palace      NA            ice_rink_tickets     18       0    R             -18       36.2            6   
11    11 day10 NA              Anna          reimbursed            0       6    10              6       42.2            0   
12    12 day10 NA              Jane          reimbursed            0       6    10              6       48.2            0   
13    13 day11 Shoes Centre    NA            shoes                15       0    R             -15       33.2           15   
14    14 day12 Dreams Dessert  NA            ice_cream             6.5     0    R              -6.5     26.6            6.5 
15    15 day13 NA              Anna          reimbursed            0      21.5  13, 14         21.5     48.2            0   
16    16 day14 Chicken World   NA            meal                  8       0    R              -8       40.2            8   
17    17 day14 Art Gallery     NA            gallery_ticket        3.5     0    R              -3.5     36.6            3.5 
18    18 day14 Smoothie Hut    NA            drink                 2       0    R              -2       34.6            2   
19    19 day14 NA              Joe           reimbursed            0      13.5  16, 17, 18     13.5     48.2            0   

但是这段代码没有显示我想要的所有计算类型的

actual_decrease
列的输出——也就是说,从第 13 行开始它是不正确的。


因为,我的实际数据集非常大,我宁愿避免使用循环。

非常感谢艾米的帮助:)


编辑: 这就是我希望数据集的样子:

# A tibble: 19 × 9
      id day   sent_to         received_from reference        decrease increase reimbursed_id actual_decrease
   <dbl> <chr> <chr>           <chr>         <chr>               <dbl>    <dbl> <chr>                   <dbl>
 1     1 day1  NA              ATM           add_cash              0      50    NA                       0   
 2     2 day2  NA              Sarah         gift                  0      30    NA                       0   
 3     3 day3  Blue Superstore NA            shopping             15.2     0    NA                      15.2 
 4     4 day4  Garden Cinema   NA            cinema_tickets       10.8     0    R                        5.4 
 5     5 day5  Pasta House     NA            meal                 12.5     0    R                        5.25
 6     6 day6  NA              Jane          reimbursed            0       5.4  4                        0   
 7     7 day6  NA              Joe           reimbursed            0       7.25 5                        0   
 8     8 day7  Pizzaria        NA            meal                 10       0    R                        0   
 9     9 day8  NA              Sarah         reimbursed            0      10    8                        0   
10    10 day9  Ice Palace      NA            ice_rink_tickets     18       0    R                        6   
11    11 day10 NA              Anna          reimbursed            0       6    10                       0   
12    12 day10 NA              Jane          reimbursed            0       6    10                       0   
13    13 day11 Shoes Centre    NA            shoes                15       0    R                        0   
14    14 day12 Dreams Dessert  NA            ice_cream             6.5     0    R                        0   
15    15 day13 NA              Anna          reimbursed            0      21.5  13, 14                   0   
16    16 day14 Chicken World   NA            meal                  8       0    R                        0   
17    17 day14 Art Gallery     NA            gallery_ticket        3.5     0    R                        0   
18    18 day14 Smoothie Hut    NA            drink                 2       0    R                        0   
19    19 day14 NA              Joe           reimbursed            0      13.5  16, 17, 18               0 
r dataframe dplyr tibble mutate
© www.soinside.com 2019 - 2024. All rights reserved.