根据另一个列值将唯一时间戳附加到一行列

问题描述 投票:0回答:1

社区下午好,

我目前有一个大文件,其中包含多个标头,其中特别包括 fix_timestamps 和 id。数据格式如下。

修复时间戳 id
2023-08-01 00:02:52.527 WPA54
2023-08-01 00:02:52.527 WPA54
2023-08-01 00:02:52.527 WPA54
2023-08-01 00:10:10.640 WPA54
2023-08-01 00:10:10.640 WPA54
2023-08-01 00:10:26.937 WPA54
2023-08-01 00:10:26.937 WPA54
2023-08-01 00:10:10.640 IBT675
2023-08-01 00:10:10.640 IBT675
2023-08-01 00:10:10.640 IBT675
2023-08-01 00:10:26.937 IBT675
2023-08-01 00:10:26.937 IBT675
2023-08-01 00:02:52.527 IBT675
2023-08-01 00:02:52.527 IBT675
等等...等等

我希望能够读取我的文件,并对于 length id 列的每个值,在 fix_timestamps 列中查找唯一时间,并将这些时间附加到新的 fix_timestamp 中,以 ' ; 分隔。 '。本质上生成如下所示的数据帧输出:

修复时间戳 id
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937 WPA54
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937 WPA54
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937 WPA54
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937 WPA54
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937 WPA54
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937 WPA54
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937 WPA54
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937 WPA54
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527 IBT675
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527 IBT675
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527 IBT675
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527 IBT675
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527 IBT675
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527 IBT675
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527 IBT675

我当前的脚本能够遵循一些逻辑,但不完全,我一直在试图找出原因

import pandas as pd

# Read the CSV file
file_path = 'input.csv'
df = pd.read_csv(file_path)

# Function to append unique timestamps based on ID length
def append_timestamp(row):
    id_length = len(row['id'])
    timestamps = []

    # for i in range(id_length):
    #     timestamps.add(row['fix_timestamps'])
    # return ';'.join(timestamps)

    for i in range(id_length):
        timestamps.append(row['fix_timestamps'])
    return ';'.join(timestamps)

# Apply the function to the DataFrame rows
df['fix_timestamps'] = df.apply(append_timestamp, axis=1)

print(df)

# Save the DataFrame to a CSV file
output_file_path = 'output'
df.to_csv(output_file_path, index=False)

使用上面的输入示例并通过当前脚本运行它,我的输出基本上会生成以下内容:

修复时间戳 id
2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;..... WPA54
2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;.... WPA54
2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;.... WPA54
2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;..... WPA54
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527.... IBT675
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527.... IBT675
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527.... IBT675

因此,当我想为 id 的所有唯一时间执行此操作时,似乎只是在每个唯一时间重复时间戳并附加相同的值。

谢谢大家

pandas append grouping uniqueidentifier
1个回答
0
投票

您可以使用:

df['fix_timestamps'] = (df['fix_timestamps'].astype(str).groupby(df['id'])
                        .transform(lambda x: ';'.join(x.unique()))
                       )

输出:

                                       fix_timestamps      id
0   2023-08-01 00:02:52.527;2023-08-01 00:10:10.64...   WPA54
1   2023-08-01 00:02:52.527;2023-08-01 00:10:10.64...   WPA54
2   2023-08-01 00:02:52.527;2023-08-01 00:10:10.64...   WPA54
3   2023-08-01 00:02:52.527;2023-08-01 00:10:10.64...   WPA54
4   2023-08-01 00:02:52.527;2023-08-01 00:10:10.64...   WPA54
5   2023-08-01 00:02:52.527;2023-08-01 00:10:10.64...   WPA54
6   2023-08-01 00:02:52.527;2023-08-01 00:10:10.64...   WPA54
7   2023-08-01 00:10:10.640;2023-08-01 00:10:26.93...  IBT675
8   2023-08-01 00:10:10.640;2023-08-01 00:10:26.93...  IBT675
9   2023-08-01 00:10:10.640;2023-08-01 00:10:26.93...  IBT675
10  2023-08-01 00:10:10.640;2023-08-01 00:10:26.93...  IBT675
11  2023-08-01 00:10:10.640;2023-08-01 00:10:26.93...  IBT675
12  2023-08-01 00:10:10.640;2023-08-01 00:10:26.93...  IBT675
13  2023-08-01 00:10:10.640;2023-08-01 00:10:26.93...  IBT675
© www.soinside.com 2019 - 2024. All rights reserved.