这是我的一部分数据:
data_x <- tribble(
~price, ~bokey, ~id, ~cost, ~revenue,
1, "a", 10, 0.20, 30,
2, "b", 20, 0.30, 60,
3, "c", 20, 0.30, 40,
4, "d", 10, 0.20, 100,
5, "e", 30, 0.10, 40,
6, "f", 10, 0.20, 10,
1, "g", 20, 0.30, 80,
2 , "h", 10, 0.20, 20,
3, "h", 30, 0.10, 20,
3, "i", 20, 0.30, 40,
)
如您所见,有三种不同类型的ID:10、20、30。但是在实际数据中,几乎有100个ID。我想根据这些ID汇总数据。因为我不知道如何循环执行,所以我基本上创建了一些子集:
data_10 <- data_x %>% filter(id == 10)
data_20 <- data_x %>% filter(id == 20)
data_30 <- data_x %>% filter(id == 30)
以下是汇总数据:
data_agg <- data_10 %>%
group_by(priceseg = cut(as.numeric(price), c(0, 1, 3, 5, 6))) %>%
summarise(price_n = n_distinct(bokey),
Cost = sum(cost, na.rm = T),
Revenue = sum(revenue, na.rm = T),
clicks = n_distinct(bokey)) %>%
mutate(price_n2 = round(100 * prop.table(price_n), 2),
(zet = Cost/Revenue))
但是我想再增加一列以显示ID。这是所需的数据:
data_desired <- tribble(
~id, ~priceseg, ~price_n, ~Cost, ~Revenue, ~clicks, ~price_n2, ~`(zet = Cost/Revenue)`
10, (0,1] 1 0.2 30 1 25 0.00667
10, (1,3] 1 0.2 20 1 25 0.01
10, (3,5] 1 0.2 100 1 25 0.002
10, (5,6] 1 0.2 10 1 25 0.02
20,
20,
.
.
) 30,
如何获得?
一个选项是split
,并在指定map
时与.id
一起循环>
library(dplyr) library(purrr) data_x %>% split(.$id) %>% map_dfr(~ .x %>% group_by(priceseg = cut(as.numeric(price), c(0, 1, 3, 5, 6))) %>% summarise(price_n = n_distinct(bokey), Cost = sum(cost, na.rm = T), Revenue = sum(revenue, na.rm = T), clicks = n_distinct(bokey)) %>% mutate(price_n2 = round(100 * prop.table(price_n), 2), (zet = Cost/Revenue)), .id = "id" ) # A tibble: 8 x 8 # id priceseg price_n Cost Revenue clicks price_n2 `(zet = Cost/Revenue)` # <chr> <fct> <int> <dbl> <dbl> <int> <dbl> <dbl> #1 10 (0,1] 1 0.2 30 1 25 0.00667 #2 10 (1,3] 1 0.2 20 1 25 0.01 #3 10 (3,5] 1 0.2 100 1 25 0.002 #4 10 (5,6] 1 0.2 10 1 25 0.02 #5 20 (0,1] 1 0.3 80 1 25 0.00375 #6 20 (1,3] 3 0.900 140 3 75 0.00643 #7 30 (1,3] 1 0.1 20 1 50 0.005 #8 30 (3,5] 1 0.1 40 1 50 0.0025
cut
步骤也可以通过findInterval
进行更改>