每组中有dplyr的循环以创建逐步曲线

问题描述 投票:0回答:1

问题:我正在使用以下表示出价和报价的数据框:

PERIOD  CURVETYPE  QUANTITY  PRICE  CURVE_ID
1       SUPPLY     100       0      1
1       DEMAND     500       3000   2
1       SUPPLY     300       63     1
1       DEMAND     200       2900   2
1       SUPPLY     500       52     1
2       DEMAND     700       2800   3
2       DEMAND     100       3000   3
2       SUPPLY     900       73     4
...

CURVE_ID列代表ID,即曲线的标识:每个时期的需求/供应。我想要的是将每个曲线构建为STEPWISE曲线(带有步骤),这意味着:

  1. 价格从低到高的顺序
  2. Q中的总和(合计数量)
  3. 创建代表STEP的点(带有*标记)

例如,对于CURVE_ID = 1:

   PERIOD  CURVETYPE  QUANTITY    Q       PRICE  CURVE_ID
   1       SUPPLY     100         100     0      1
*  1       SUPPLY     100         100     52     1
   1       SUPPLY     500         600     52     1
*  1       SUPPLY     500         600     63     1
   1       SUPPLY     300         900     63     1

...

我要做的第一点和第二点。问题在于处理第三点。我创建了一个代码,用这些点市场创建带有*的另一个数据框,它适用于轻量数据。但是,在处理大量数据时,它会花费很长时间并且无法正常工作。代码如下:

#Cumsum to aggregate the quantity
data_frame <- filter(original_data, CURVETYPE=="SUPPLY") %>% group_by(CURVE_ID) %>% arrange(PRICE, Q) %>% mutate(QUANTITY=cumsum(Q))
data_rbind <- data_frame[1,] #Here Im going to stick my data_frame with the new points

for(i in data_frame$CURVE_ID)){
  data <- filter(data_frame, CURVE_ID==i)
  data %>% arrange(QUANTITY)
  for(j in 1:nrow(data)){
    if (data[j,"PRICE"]!=max(data$PRICE)){
      data[j,"PRICE"]=data[j+1,"PRICE"]
    }
  }
  data_rbind <- rbind(data_rbind,data)
  rm(data)
}

我确信这一定是dplyr的group_by做到这一点的一种方法。我创建了一个函数:

f<- function(q,p){
  maxPrice <- max(p)
  for (i in 1:nrow()){
    if (i!=which(p==maxPrice)){
      p[i] <- p[i+1]
    }
  }
}
data_frame <- data_frame %>% group_by(CURVE_ID) %>% f(Q,PRICE)

但是我得到了错误:

f(。,Q,PRICE)中的错误:未使用的参数(PRICE)

r loops dplyr curve
1个回答
1
投票

这是您想要的吗?它需要dplyrtidyr

df %>% filter(CURVETYPE == "SUPPLY") %>% 
  group_by(CURVE_ID) %>% 
  arrange(PRICE, QUANTITY) %>% 
  mutate(Q = cumsum(QUANTITY),
         endPrice = lead(PRICE)) %>%                    #add price at end of step
  fill(endPrice) %>%                                    #fill NA final value
  gather(key = key, value = PRICE, PRICE, endPrice) %>% #convert price cols to long format
  select(-key) %>%                                      #remove start/end distinction
  distinct() %>%                                        #remove duplicated last rows of group
  arrange(CURVE_ID, Q)

  PERIOD CURVETYPE QUANTITY CURVE_ID     Q PRICE
   <int> <chr>        <int>    <int> <int> <int>
1      1 SUPPLY         100        1   100     0
2      1 SUPPLY         100        1   100    52
3      1 SUPPLY         500        1   600    52
4      1 SUPPLY         500        1   600    63
5      1 SUPPLY         300        1   900    63
6      2 SUPPLY         900        4   900    73
7      2 SUPPLY         900        4   900    NA
© www.soinside.com 2019 - 2024. All rights reserved.