该问题:我正在使用以下表示出价和报价的数据框:
PERIOD CURVETYPE QUANTITY PRICE CURVE_ID
1 SUPPLY 100 0 1
1 DEMAND 500 3000 2
1 SUPPLY 300 63 1
1 DEMAND 200 2900 2
1 SUPPLY 500 52 1
2 DEMAND 700 2800 3
2 DEMAND 100 3000 3
2 SUPPLY 900 73 4
...
CURVE_ID列代表ID,即曲线的标识:每个时期的需求/供应。我想要的是将每个曲线构建为STEPWISE曲线(带有步骤),这意味着:
例如,对于CURVE_ID = 1:
PERIOD CURVETYPE QUANTITY Q PRICE CURVE_ID
1 SUPPLY 100 100 0 1
* 1 SUPPLY 100 100 52 1
1 SUPPLY 500 600 52 1
* 1 SUPPLY 500 600 63 1
1 SUPPLY 300 900 63 1
...
我要做的第一点和第二点。问题在于处理第三点。我创建了一个代码,用这些点市场创建带有*的另一个数据框,它适用于轻量数据。但是,在处理大量数据时,它会花费很长时间并且无法正常工作。代码如下:
#Cumsum to aggregate the quantity
data_frame <- filter(original_data, CURVETYPE=="SUPPLY") %>% group_by(CURVE_ID) %>% arrange(PRICE, Q) %>% mutate(QUANTITY=cumsum(Q))
data_rbind <- data_frame[1,] #Here Im going to stick my data_frame with the new points
for(i in data_frame$CURVE_ID)){
data <- filter(data_frame, CURVE_ID==i)
data %>% arrange(QUANTITY)
for(j in 1:nrow(data)){
if (data[j,"PRICE"]!=max(data$PRICE)){
data[j,"PRICE"]=data[j+1,"PRICE"]
}
}
data_rbind <- rbind(data_rbind,data)
rm(data)
}
我确信这一定是dplyr的group_by做到这一点的一种方法。我创建了一个函数:
f<- function(q,p){
maxPrice <- max(p)
for (i in 1:nrow()){
if (i!=which(p==maxPrice)){
p[i] <- p[i+1]
}
}
}
data_frame <- data_frame %>% group_by(CURVE_ID) %>% f(Q,PRICE)
但是我得到了错误:
f(。,Q,PRICE)中的错误:未使用的参数(PRICE)
这是您想要的吗?它需要dplyr
和tidyr
。
df %>% filter(CURVETYPE == "SUPPLY") %>%
group_by(CURVE_ID) %>%
arrange(PRICE, QUANTITY) %>%
mutate(Q = cumsum(QUANTITY),
endPrice = lead(PRICE)) %>% #add price at end of step
fill(endPrice) %>% #fill NA final value
gather(key = key, value = PRICE, PRICE, endPrice) %>% #convert price cols to long format
select(-key) %>% #remove start/end distinction
distinct() %>% #remove duplicated last rows of group
arrange(CURVE_ID, Q)
PERIOD CURVETYPE QUANTITY CURVE_ID Q PRICE
<int> <chr> <int> <int> <int> <int>
1 1 SUPPLY 100 1 100 0
2 1 SUPPLY 100 1 100 52
3 1 SUPPLY 500 1 600 52
4 1 SUPPLY 500 1 600 63
5 1 SUPPLY 300 1 900 63
6 2 SUPPLY 900 4 900 73
7 2 SUPPLY 900 4 900 NA