将2个连续但不规则的时间序列数据集与重叠行组合,从而消除重复行

问题描述 投票:0回答:1

使用R,我试图使用相同的字段但重叠的行组合2个连续但不规则的时间序列数据集;即,一些相同的交易出现在两个数据集中,我想消除重叠的行。

因为时间间隔是不规则的,所以每个数据集中可能有相同的有效行。对于我的示例数据集,我想将数据集1中的第1行到第12行与数据集2中的第6行到第11行组合,以获得所需的结果。在这个例子中,很明显数据集2的第1行到第5行与数据集1的第8行到第12行相同。我尝试使用unique()函数,但它也消除了相同的有效行。关于如何解决这个困境的任何想法?

数据集1

1  2019-02-19 15:17:14 25886    1                           
2  2019-02-19 15:17:14 25886    1                           
3  2019-02-19 15:17:15 25885    1                           
4  2019-02-19 15:17:16 25886    2                           
5  2019-02-19 15:17:16 25886    1                           
6  2019-02-19 15:17:16 25886    2                           
7  2019-02-19 15:17:16 25886    1                           
8  2019-02-19 15:17:18 25885    4                           
9  2019-02-19 15:17:19 25885    1  
10 2019-02-19 15:17:19 25885    1                            
11 2019-02-19 15:17:20 25885    2                           
12 2019-02-19 15:17:21 25885    1                           

数据集2

1  2019-02-19 15:17:18 25885    4                           
2  2019-02-19 15:17:19 25885    1  
3  2019-02-19 15:17:19 25885    1                          
4  2019-02-19 15:17:20 25885    2                           
5  2019-02-19 15:17:21 25885    1                           
6  2019-02-19 15:17:23 25886    2                           
7  2019-02-19 15:17:23 25886    3                           
8  2019-02-19 15:17:23 25886    3                           
9  2019-02-19 15:17:23 25886    1                           
10 2019-02-19 15:17:23 25886    1                           
11 2019-02-19 15:17:23 25886    2 

我想要的结果是:

1  2019-02-19 15:17:14 25886    1                           
2  2019-02-19 15:17:14 25886    1                           
3  2019-02-19 15:17:15 25885    1                           
4  2019-02-19 15:17:16 25886    2                           
5  2019-02-19 15:17:16 25886    1                           
6  2019-02-19 15:17:16 25886    2                           
7  2019-02-19 15:17:16 25886    1                           
8  2019-02-19 15:17:18 25885    4                           
9  2019-02-19 15:17:19 25885    1   
10 2019-02-19 15:17:19 25885    1                             
11 2019-02-19 15:17:20 25885    2                           
12 2019-02-19 15:17:21 25885    1                      
13 2019-02-19 15:17:23 25886    2                           
14 2019-02-19 15:17:23 25886    3                           
15 2019-02-19 15:17:23 25886    3                           
16 2019-02-19 15:17:23 25886    1                           
17 2019-02-19 15:17:23 25886    1                           
18 2019-02-19 15:17:23 25886    2 

这是数据集1

structure(list(time = structure(c(1550589434, 1550589434, 1550589435, 
1550589436, 1550589436, 1550589436, 1550589436, 1550589438, 1550589439, 
1550589439, 1550589440, 1550589441), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), price = c(25886, 25886, 25885, 25886, 25886, 
25886, 25886, 25885, 25885, 25885, 25885, 25885), size = c(1, 
1, 1, 2, 1, 2, 1, 4, 1, 1, 2, 1)), row.names = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10", "11", "12"), class = "data.frame")

这是数据集2

structure(list(time = structure(c(1550589438, 1550589439, 1550589439, 
1550589440, 1550589441, 1550589443, 1550589443, 1550589443, 1550589443, 
1550589443, 1550589443), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
price = c(25885, 25885, 25885, 25885, 25885, 25886, 25886, 
25886, 25886, 25886, 25886), size = c(4, 1, 1, 2, 1, 2, 3, 
3, 1, 1, 2)), row.names = c("1", "2", "3", "4", "5", "6", 
"7", "8", "9", "10", "11"), class = "data.frame")
r
1个回答
0
投票

一个想法是:

library(dplyr)

df2 %>%
  anti_join(df1) %>%
  bind_rows(df1)
© www.soinside.com 2019 - 2024. All rights reserved.