这里有两个数据框,按照我想要的方式分组:
last5s = pd.Timestamp.now().replace(microsecond=0) - pd.Timedelta('5s')
dates = pd.date_range(last5s, periods = 5, freq='s')
N=10
data1 = np.random.randint(0,10,N)
data2 = np.random.randint(0,10,N)
df1 = pd.DataFrame({'timestamp': np.random.choice(dates, size=N), 'A': data1})
df2 = pd.DataFrame({'timestamp': np.random.choice(dates, size=N), 'B': data2})
print(df1)
print(df2)
print()
g1 = df1.groupby(pd.Grouper(key='timestamp', freq='1s'))
print("g1:")
for time, group in g1:
print('time:', time)
print(group)
print()
print()
g2 = df2.groupby(pd.Grouper(key='timestamp', freq='1s'))
print('g2:')
for time, group in g2:
print('time:', time)
print(group)
print()
输出(例如):
timestamp A
0 2024-03-01 10:05:26 7
1 2024-03-01 10:05:25 8
2 2024-03-01 10:05:28 1
3 2024-03-01 10:05:24 2
4 2024-03-01 10:05:28 5
5 2024-03-01 10:05:27 4
6 2024-03-01 10:05:24 6
7 2024-03-01 10:05:26 3
8 2024-03-01 10:05:26 8
9 2024-03-01 10:05:28 8
timestamp B
0 2024-03-01 10:05:25 1
1 2024-03-01 10:05:26 6
2 2024-03-01 10:05:25 5
3 2024-03-01 10:05:28 7
4 2024-03-01 10:05:27 7
5 2024-03-01 10:05:28 1
6 2024-03-01 10:05:28 4
7 2024-03-01 10:05:25 0
8 2024-03-01 10:05:24 6
9 2024-03-01 10:05:24 5
g1:
time: 2024-03-01 10:05:24
timestamp A
3 2024-03-01 10:05:24 2
6 2024-03-01 10:05:24 6
time: 2024-03-01 10:05:25
timestamp A
1 2024-03-01 10:05:25 8
time: 2024-03-01 10:05:26
timestamp A
0 2024-03-01 10:05:26 7
7 2024-03-01 10:05:26 3
8 2024-03-01 10:05:26 8
time: 2024-03-01 10:05:27
timestamp A
5 2024-03-01 10:05:27 4
time: 2024-03-01 10:05:28
timestamp A
2 2024-03-01 10:05:28 1
4 2024-03-01 10:05:28 5
9 2024-03-01 10:05:28 8
g2:
time: 2024-03-01 10:05:24
timestamp B
8 2024-03-01 10:05:24 6
9 2024-03-01 10:05:24 5
time: 2024-03-01 10:05:25
timestamp B
0 2024-03-01 10:05:25 1
2 2024-03-01 10:05:25 5
7 2024-03-01 10:05:25 0
time: 2024-03-01 10:05:26
timestamp B
1 2024-03-01 10:05:26 6
time: 2024-03-01 10:05:27
timestamp B
4 2024-03-01 10:05:27 7
time: 2024-03-01 10:05:28
timestamp B
3 2024-03-01 10:05:28 7
5 2024-03-01 10:05:28 1
6 2024-03-01 10:05:28 4
如何将这些组“加入”在一起,以便我可以一起迭代它们?例如。我希望能够做到:
for time, group1, group2 in somehow_joined(g1,g2):
<do stuff with group1 and group2 in this common time group>
itertools.groupby
:
from itertools import groupby
g1 = df1.groupby(pd.Grouper(key="timestamp", freq="1s"))
g2 = df2.groupby(pd.Grouper(key="timestamp", freq="1s"))
for t, g in groupby(sorted([*g1, *g2], key=lambda k: k[0]), lambda k: k[0]):
print(t)
print("-" * 80)
for _, group in g:
print(group)
print()
打印(例如):
2024-03-01 00:14:25
--------------------------------------------------------------------------------
timestamp A
7 2024-03-01 00:14:25 0
9 2024-03-01 00:14:25 7
timestamp B
1 2024-03-01 00:14:25 0
3 2024-03-01 00:14:25 4
7 2024-03-01 00:14:25 1
2024-03-01 00:14:26
--------------------------------------------------------------------------------
timestamp A
5 2024-03-01 00:14:26 5
timestamp B
2 2024-03-01 00:14:26 4
5 2024-03-01 00:14:26 0
6 2024-03-01 00:14:26 9
2024-03-01 00:14:27
--------------------------------------------------------------------------------
timestamp A
0 2024-03-01 00:14:27 4
4 2024-03-01 00:14:27 6
timestamp B
4 2024-03-01 00:14:27 4
8 2024-03-01 00:14:27 6
2024-03-01 00:14:28
--------------------------------------------------------------------------------
timestamp A
1 2024-03-01 00:14:28 5
8 2024-03-01 00:14:28 8
timestamp B
0 2024-03-01 00:14:28 0
2024-03-01 00:14:29
--------------------------------------------------------------------------------
timestamp A
2 2024-03-01 00:14:29 5
3 2024-03-01 00:14:29 4
6 2024-03-01 00:14:29 1
timestamp B
9 2024-03-01 00:14:29 6