pandas 插值未返回预期结果

问题描述 投票:0回答:1

我有来自一个大型 CSV 文件的数据点,如下所示:

MMSI,BaseDateTime,LAT,LON,SOG,COG

11,0,27.29237,-90.96787,0.1,195.7
11,360,27.29237,-90.96793,0.1,188.0
111, 0,27.3538,-94.6253,0.1,35.3
111,180,27.35376,-94.62543,0.1,225.5

basedatetime 是从第一个条目到第二个条目的秒数度量。

我的问题是,我想插入这些数据,这样我就不想在基础时间中出现这些不均匀的条目,而是每 30 秒就有一个条目,直到最后一点。

为此,我创建了以下代码:

def interpolater(df: np.ndarray):
    # Create a Pandas DataFrame from the NumPy array
    df = pd.DataFrame(df, columns=['MMSI', 'BaseDateTime', 'LAT', 'LON', 'SOG', 'COG'])

    df = df.groupby(['MMSI'])
    interpolated_dfs = []

    for name, group in df:
        # Set 'BaseDateTime' as the index for each group
        group.set_index('BaseDateTime', inplace=True)
        group.index = pd.to_timedelta(group.index, unit='s')
        group = group.astype({'LAT': 'float32', 'LON': 'float32', 'SOG': 'float32', 'COG': 'float32'})
        group = group.infer_objects(copy=False)
        # Resample the data with a 30 integer intervals and interpolate linearly
        resampled_group = group.resample('30S').interpolate(method='linear')

        # Reset the index
        resampled_group.reset_index(inplace=True)

        # Convert the time-based index back to its original format (e.g., seconds)
        resampled_group['BaseDateTime'] = resampled_group['BaseDateTime'].dt.total_seconds()

        # Append the resampled group to the list of interpolated DataFrames
        interpolated_dfs.append(resampled_group)

    # Concatenate the list of DataFrames back into a single DataFrame
    interpolated_df = pd.concat(interpolated_dfs)

    # Convert the result to a NumPy array
    interpolated_data = interpolated_df.to_numpy()

    return interpolated_data

此代码在时间维度上按预期转换数据,如下所示:

MMSI,BaseDateTime,Latitude,Longitude,SOG,COG
11,0.0,27.29237,-90.96787,0.1,195.7
11,30.0,27.29237,-90.96787,0.1,195.7
11,60.0,27.29237,-90.96787,0.1,195.7
11,90.0,27.29237,-90.96787,0.1,195.7
11,120.0,27.29237,-90.96787,0.1,195.7
11,150.0,27.29237,-90.96787,0.1,195.7
11,180.0,27.29237,-90.96787,0.1,195.7
11,210.0,27.29237,-90.96787,0.1,195.7
11,240.0,27.29237,-90.96787,0.1,195.7
11,270.0,27.29237,-90.96787,0.1,195.7
11,300.0,27.29237,-90.96787,0.1,195.7
11,330.0,27.29237,-90.96787,0.1,195.7
11,360.0,27.29237,-90.96787,0.1,195.7
111,0.0,27.3538,-94.6253,0.1,35.3
111,30.0,27.3538,-94.6253,0.1,35.3
111,60.0,27.3538,-94.6253,0.1,35.3
111,90.0,27.3538,-94.6253,0.1,35.3
111,120.0,27.3538,-94.6253,0.1,35.3
111,150.0,27.3538,-94.6253,0.1,35.3
111,180.0,27.3538,-94.6253,0.1,35.3

我的问题是纬度、经度、sog 和 cog 没有任何明显的变化,我不太确定如何解决这个问题。

python pandas interpolation
1个回答
0
投票

您需要告诉

resample
如何重新采样。尝试添加
mean()
例如:

resampled_group = group.resample('30S').mean().interpolate(method='linear')

请参阅 重采样文档

© www.soinside.com 2019 - 2024. All rights reserved.