我正在尝试合并两个具有不同日期时间的 pandas.Series,但在最终合并中获得正确的值时遇到一些问题。我看到一些帖子,他们将两个系列都保存在数据框中,但我想返回一个包含两者总和的系列。
背景: 我有 pandas 系列,其中包含房间中检测到的人数,我想将它们合并到建筑物的计数中(在下面的示例中包含 2 个房间)。我可以通过将房间聚集在一起直到将所有房间合并为一个(即建筑物)来做到这一点
我觉得我必须对系列进行排序,然后逐行浏览才能获得正确的计数。 到目前为止,我使用 zip() 函数逐行浏览该系列(已排序),但我怀疑有更好的方法来做到这一点。有什么想法吗?
这是一个片段代码:
# The data in
room1_idx = pd.to_datetime([
'2023-08-11T17:00:44', # 6 people counted
'2023-08-11T17:06:47', # 7 people counted
'2023-08-11T17:06:49', # 8 people counted
'2023-08-11T17:07:00', # 10 people counted
'2023-08-11T17:07:20', # 8 people counted
])
room1 = pd.Series([6, 7, 8, 10, 8], index=room1_idx, name="Room 1")
room2_idx = pd.to_datetime([
'2023-08-11T17:06:45', # 1 people counted
'2023-08-11T17:06:46', # 4 people counted
'2023-08-11T17:06:47', # 5 people counted
'2023-08-11T17:07:02', # 10 people counted
'2023-08-11T17:07:10', # 7 people counted
'2023-08-11T17:07:30', # 2 people counted
])
room2 = pd.Series([1, 4, 5, 10, 7, 2], index=room2_idx, name="Room 2")
print(room1)
print(room2)
我想要得到的是一个可以输出以下内容的函数:
building_idx = pd.to_datetime([
'2023-08-11 17:00:44', # 6+0 people counted
'2023-08-11 17:06:45', # 6+1 people counted
'2023-08-11 17:06:46', # 6+4 people counted
'2023-08-11 17:06:47', # 7+5 people counted
'2023-08-11 17:06:49', # 8+5 people counted
'2023-08-11 17:07:00', # 10+5 people counted
'2023-08-11 17:07:02', # 10+10 people counted
'2023-08-11 17:07:10', # 10+7 people counted
'2023-08-11 17:07:20', # 8+7 people counted
'2023-08-11 17:07:30', # 9+2 people counted
])
building = pd.Series([6, 7, 10, 12, 13, 15, 20, 17, 15, 11], index=building_idx, name="Building")
print(building)
concat
对齐、ffill
和 sum
:
out = (pd
.concat([room1, room2], axis=1)
.ffill()
.sum(axis=1)
)
输出:
2023-08-11 17:00:44 6.0
2023-08-11 17:06:45 7.0
2023-08-11 17:06:46 10.0
2023-08-11 17:06:47 12.0
2023-08-11 17:06:49 13.0
2023-08-11 17:07:00 15.0
2023-08-11 17:07:02 20.0
2023-08-11 17:07:10 17.0
2023-08-11 17:07:20 15.0
2023-08-11 17:07:30 10.0
dtype: float64