假设我有带有索引
time
和列表列food
的数据框A和B。两个数据框都类似于历史日志,我当时拥有的水果和蔬菜:
答:
food
time
2021-08-20 ["apple","orange"]
2021-08-28 ["apple","orange","banana"]
乙:
food
time
2021-08-19 ["squash"]
2021-08-24 ["squash","carrot"]
2021-08-29 ["carrot"]
我怎样才能结合这两个数据框,以便它同时跟踪水果和蔬菜?
food
time
2021-08-19 ["squash"]
2021-08-20 ["apple","orange","squash"]
2021-08-24 ["apple","orange","squash","carrot"]
2021-08-28 ["apple","orange","banana","squash","carrot"]
2021-08-29 ["apple","orange","banana","carrot"]
本质上,我想合并行,并且对于每一行,合并该时间戳之前两个最近条目的食物。保证 A 和 B 中的食物不重叠,A 和 B 之间的时间戳不重叠。
我尝试直接使用 pd.concat([A,B]) ,但它没有结合食物。
我相信这就是您要找的:
# Create the first data frame
df_a = pd.DataFrame({
'date': ['2021-08-20', '2021-08-28'],
'foods': [['apple', 'orange'], ['apple', 'orange', 'banana']]
})
# Create the second data frame
df_b = pd.DataFrame({
'date': ['2021-08-19', '2021-08-20', '2021-08-29'],
'foods': [['squash'], ['squash', 'carrot'], ['carrot']]
})
# Merge the two data frames on the date column
merged = pd.merge(df_a, df_b, on='date', how='outer')
# Concatenate the food item lists
def concat_foods(row):
foods_x = row['foods_x'] if isinstance(row['foods_x'], list) else []
foods_y = row['foods_y'] if isinstance(row['foods_y'], list) else []
return list(set(foods_x + foods_y))
merged['foods'] = merged.apply(concat_foods, axis=1)
# Remove the original food item columns
merged = merged.drop(['foods_x', 'foods_y'], axis=1)
# Sort the data frame by date
merged = merged.sort_values('date')
sliding_window_view
的一个选项:
from numpy.lib.stride_tricks import sliding_window_view as swv
A.index = pd.to_datetime(A.index)
B.index = pd.to_datetime(B.index)
out = pd.concat([A, B]).sort_index()
N = 2
out['food'] = [list(dict.fromkeys([e for x in l for e in x]))
for l in swv(pd.concat([pd.Series([[]]*(N-1)), out['food']]), N)]
输出:
food
time
2021-08-19 [squash]
2021-08-20 [squash, apple, orange]
2021-08-24 [apple, orange, squash, carrot]
2021-08-28 [squash, carrot, apple, orange, banana]
2021-08-29 [apple, orange, banana, carrot]