我有两个长度不等的系列。 (日期为索引)
s1:
2006-03-25 35.27
2006-03-26 35.22
2006-03-28 35.25
2006-04-04 35.29
2006-04-05 35.46
2006-04-06 35.21
2006-04-08 35.32
2006-04-10 35.77
s2:
2006-03-25 1800
2006-03-26 1800
2006-03-27 1800
2006-03-28 1800
2006-03-29 1800
2006-03-30 1800
2006-03-31 1800
2006-04-01 2555
2006-04-02 2555
2006-04-03 2555
2006-04-04 2555
2006-04-05 2555
2006-04-06 2555
2006-04-07 2555
2006-04-08 2555
2006-04-09 2555
2006-04-10 2555
应该合并系列,使得
s1
中每个缺失日的值是前一天的值。
输出应该是这样的:
2006-03-25 35.27 1800
2006-03-26 35.22 1800
2006-03-27 35.22 1800
2006-03-28 35.25 1800
2006-03-29 35.25 1800
2006-03-30 35.25 1800
2006-03-31 35.25 1800
2006-04-01 35.25 2555
2006-04-02 35.25 2555
2006-04-03 35.25 2555
2006-04-04 35.29 2555
2006-04-05 35.46 2555
2006-04-06 35.21 2555
2006-04-07 35.21 2555
2006-04-08 35.32 2555
2006-04-09 35.32 2555
2006-04-10 35.77 2555
您可以将它们制作成熊猫数据框,然后使用熊猫的合并方法。这意味着两个数据集的外部连接。之后使用 bfill() 填充 nan 值。这意味着向后填充缺失值。
import pandas as pd
df1 = pd.DataFrame(s1, columns=['date', 'vol1'])
df2 = pd.DataFrame(s2, columnd=['date', 'vol2'])
result = df1.merge(df2, left_on='date', right_on='date', how='outer')
result['vol2'] = result['vol2'].bfill()
这是一种做法:
# Use clipboard to create s1 as a pd.Series
s1 = pd.read_clipboard(index_col=[0], names=['s1']).squeeze()
# Use clipboard to create s2 as a pd.Series
s2 = pd.read_clipboard(index_col=[0], names=['s2']).squeeze()
让我们重新索引 s1 以匹配 s2,然后用以前的值填充 NaN。
df_out = pd.concat([s1.reindex_like(s2).ffill(), s2], axis=1)
df_out
输出:
s1 s2
2006-03-25 35.27 1800
2006-03-26 35.22 1800
2006-03-27 35.22 1800
2006-03-28 35.25 1800
2006-03-29 35.25 1800
2006-03-30 35.25 1800
2006-03-31 35.25 1800
2006-04-01 35.25 2555
2006-04-02 35.25 2555
2006-04-03 35.25 2555
2006-04-04 35.29 2555
2006-04-05 35.46 2555
2006-04-06 35.21 2555
2006-04-07 35.21 2555
2006-04-08 35.32 2555
2006-04-09 35.32 2555
2006-04-10 35.77 2555