pandas - 将两个 pandas 数据框与列表列组合在一起，但将最近时间戳中的列表组合起来

Question

假设我有带有索引

time

和列表列

food

的数据框A和B。两个数据框都类似于历史日志，我当时拥有的水果和蔬菜：

答：

            food
time
2021-08-20  ["apple","orange"] 
2021-08-28  ["apple","orange","banana"]

乙：

            food
time
2021-08-19  ["squash"] 
2021-08-24  ["squash","carrot"] 
2021-08-29  ["carrot"]

我怎样才能结合这两个数据框，以便它同时跟踪水果和蔬菜？

            food
time
2021-08-19  ["squash"]
2021-08-20  ["apple","orange","squash"] 
2021-08-24  ["apple","orange","squash","carrot"]
2021-08-28  ["apple","orange","banana","squash","carrot"]
2021-08-29  ["apple","orange","banana","carrot"]

本质上，我想合并行，并且对于每一行，合并该时间戳之前两个最近条目的食物。保证 A 和 B 中的食物不重叠，A 和 B 之间的时间戳不重叠。

我尝试直接使用 pd.concat([A,B]) ，但它没有结合食物。

Answer 1

我相信这就是您要找的：

# Create the first data frame
df_a = pd.DataFrame({
    'date': ['2021-08-20', '2021-08-28'],
    'foods': [['apple', 'orange'], ['apple', 'orange', 'banana']]
})

# Create the second data frame
df_b = pd.DataFrame({
    'date': ['2021-08-19', '2021-08-20', '2021-08-29'],
    'foods': [['squash'], ['squash', 'carrot'], ['carrot']]
})

# Merge the two data frames on the date column
merged = pd.merge(df_a, df_b, on='date', how='outer')

# Concatenate the food item lists
def concat_foods(row):
    foods_x = row['foods_x'] if isinstance(row['foods_x'], list) else []
    foods_y = row['foods_y'] if isinstance(row['foods_y'], list) else []
    return list(set(foods_x + foods_y))

merged['foods'] = merged.apply(concat_foods, axis=1)

# Remove the original food item columns
merged = merged.drop(['foods_x', 'foods_y'], axis=1)

# Sort the data frame by date
merged = merged.sort_values('date')

Answer 2

使用

sliding_window_view

的一个选项：

from numpy.lib.stride_tricks import sliding_window_view as swv

A.index = pd.to_datetime(A.index)
B.index = pd.to_datetime(B.index)

out = pd.concat([A, B]).sort_index()

N = 2
out['food'] = [list(dict.fromkeys([e for x in l for e in x]))
               for l in swv(pd.concat([pd.Series([[]]*(N-1)), out['food']]), N)]

输出：

                                               food
time                                               
2021-08-19                                 [squash]
2021-08-20                  [squash, apple, orange]
2021-08-24          [apple, orange, squash, carrot]
2021-08-28  [squash, carrot, apple, orange, banana]
2021-08-29          [apple, orange, banana, carrot]

pandas - 将两个 pandas 数据框与列表列组合在一起，但将最近时间戳中的列表组合起来

问题描述投票：0回答：2

2个回答

最新问题

pandas - 将两个 pandas 数据框与列表列组合在一起，但将最近时间戳中的列表组合起来

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2