如何在R中循环数据?

问题描述 投票:0回答:1

这是我的一部分数据:

data_x <- tribble(
  ~price, ~bokey,    ~id,     ~cost,    ~revenue,
     1,     "a",      10,      0.20,       30,      
     2,     "b",      20,      0.30,       60,  
     3,     "c",      20,      0.30,       40,
     4,     "d",      10,      0.20,      100, 
     5,     "e",      30,      0.10,       40,
     6,     "f",      10,      0.20,       10,
     1,     "g",      20,      0.30,       80,
     2 ,    "h",      10,      0.20,       20,
     3,     "h",      30,      0.10,       20,
     3,     "i",      20,      0.30,       40,
)

如您所见,有三种不同类型的ID:10、20、30。但是在实际数据中,几乎有100个ID。我想根据这些ID汇总数据。因为我不知道如何循环执行,所以我基本上创建了一些子集:

data_10 <- data_x %>% filter(id == 10)
data_20 <- data_x %>% filter(id == 20)
data_30 <- data_x %>% filter(id == 30)

以下是汇总数据:

data_agg <- data_10 %>% 
  group_by(priceseg = cut(as.numeric(price), c(0, 1, 3, 5, 6))) %>% 
  summarise(price_n = n_distinct(bokey),
            Cost = sum(cost, na.rm =  T),
            Revenue  =  sum(revenue, na.rm = T),
            clicks = n_distinct(bokey)) %>% 
  mutate(price_n2 = round(100 * prop.table(price_n), 2),
         (zet = Cost/Revenue))

但是我想再增加一列以显示ID。这是所需的数据:

data_desired <- tribble(
  ~id, ~priceseg, ~price_n, ~Cost,  ~Revenue,  ~clicks,  ~price_n2, ~`(zet = Cost/Revenue)`
   10,   (0,1]        1      0.2        30         1        25             0.00667
   10,   (1,3]        1      0.2        20         1        25                0.01       
   10,   (3,5]        1      0.2       100         1        25               0.002     
   10,   (5,6]        1      0.2        10         1        25                0.02       
   20,
   20,
   .
   .
)  30,

如何获得?

r dplyr
1个回答
1
投票

一个选项是split,并在指定map时与.id一起循环>

library(dplyr)
library(purrr)
data_x %>% 
     split(.$id) %>%
     map_dfr(~ 
            .x %>%  
                 group_by(priceseg = cut(as.numeric(price), c(0, 1, 3, 5, 6))) %>% 
                 summarise(price_n = n_distinct(bokey),
                           Cost = sum(cost, na.rm =  T),
                           Revenue  =  sum(revenue, na.rm = T),
                           clicks = n_distinct(bokey)) %>% 
                 mutate(price_n2 = round(100 * prop.table(price_n), 2),
      (zet = Cost/Revenue)), .id = "id" )
# A tibble: 8 x 8
#  id    priceseg price_n  Cost Revenue clicks price_n2 `(zet = Cost/Revenue)`
#  <chr> <fct>      <int> <dbl>   <dbl>  <int>    <dbl>                  <dbl>
#1 10    (0,1]          1 0.2        30      1       25                0.00667
#2 10    (1,3]          1 0.2        20      1       25                0.01   
#3 10    (3,5]          1 0.2       100      1       25                0.002  
#4 10    (5,6]          1 0.2        10      1       25                0.02   
#5 20    (0,1]          1 0.3        80      1       25                0.00375
#6 20    (1,3]          3 0.900     140      3       75                0.00643
#7 30    (1,3]          1 0.1        20      1       50                0.005  
#8 30    (3,5]          1 0.1        40      1       50                0.0025 

cut步骤也可以通过findInterval进行更改>

© www.soinside.com 2019 - 2024. All rights reserved.