Pandas `merge_asof` 但填充了 nan 值而不是重复值

问题描述 投票:0回答:1

我目前有这个快速示例可以使用:

import pandas as pd

left = pd.DataFrame({"left_val": [1, 2, 3, 6, 7]}, index=pd.to_datetime([1, 2, 3, 6, 7], unit='s'))
right = pd.DataFrame({"right_val": ["a", "b", "c"]}, index=pd.to_datetime([1, 5, 10], unit='s'))

# Filter to contain samples that are within the time interval of left
right_filtered = right[(right.index >= left.index.min()) & (right.index <= left.index.max())]

output = pd.merge_asof(left, right_filtered, left_index=True, right_index=True, direction="nearest")

我的输出是:

                     left_val right_val
1970-01-01 00:00:01         1         a
1970-01-01 00:00:02         2         a
1970-01-01 00:00:03         3         a
1970-01-01 00:00:06         6         b
1970-01-01 00:00:07         7         b

但是我想要以下内容:

                     left_val right_val
1970-01-01 00:00:01         1         a
1970-01-01 00:00:02         2         Nan
1970-01-01 00:00:03         3         Nan
1970-01-01 00:00:06         6         b
1970-01-01 00:00:07         7         Nan

主要区别在于,我希望正确的值仅在输出数据框中出现一次,并填充

Nan
其他值,以便我可以创建稀疏数据框并节省一些空间。我想避免迭代结果以将重复值设置为
Nan
,因为:

  • 速度原因
  • 如果我在
    right
    内有两个连续的值,此方法将删除原始信息

我一直在寻找输入参数和方法来执行类似的操作,但我找不到它。

谢谢!

python pandas dataframe
1个回答
0
投票

完成后可以调整输出

.merge_asof
:

groups = (output["right_val"] != output["right_val"].shift(1)).cumsum()
output["right_val"] = np.where(~groups.duplicated(), output["right_val"], np.nan)

print(output)

打印:

                     left_val right_val
1970-01-01 00:00:01         1         a
1970-01-01 00:00:02         2       NaN
1970-01-01 00:00:03         3       NaN
1970-01-01 00:00:06         6         b
1970-01-01 00:00:07         7       NaN
© www.soinside.com 2019 - 2024. All rights reserved.