社区下午好,
我目前有一个大文件,其中包含多个标头,其中特别包括 fix_timestamps 和 id。数据格式如下。
修复时间戳 | id |
---|---|
2023-08-01 00:02:52.527 | WPA54 |
2023-08-01 00:02:52.527 | WPA54 |
2023-08-01 00:02:52.527 | WPA54 |
2023-08-01 00:10:10.640 | WPA54 |
2023-08-01 00:10:10.640 | WPA54 |
2023-08-01 00:10:26.937 | WPA54 |
2023-08-01 00:10:26.937 | WPA54 |
2023-08-01 00:10:10.640 | IBT675 |
2023-08-01 00:10:10.640 | IBT675 |
2023-08-01 00:10:10.640 | IBT675 |
2023-08-01 00:10:26.937 | IBT675 |
2023-08-01 00:10:26.937 | IBT675 |
2023-08-01 00:02:52.527 | IBT675 |
2023-08-01 00:02:52.527 | IBT675 |
等等...等等 |
我希望能够读取我的文件,并对于 length id 列的每个值,在 fix_timestamps 列中查找唯一时间,并将这些时间附加到新的 fix_timestamp 中,以 ' ; 分隔。 '。本质上生成如下所示的数据帧输出:
修复时间戳 | id |
---|---|
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937 | WPA54 |
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937 | WPA54 |
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937 | WPA54 |
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937 | WPA54 |
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937 | WPA54 |
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937 | WPA54 |
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937 | WPA54 |
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937 | WPA54 |
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527 | IBT675 |
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527 | IBT675 |
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527 | IBT675 |
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527 | IBT675 |
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527 | IBT675 |
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527 | IBT675 |
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527 | IBT675 |
我当前的脚本能够遵循一些逻辑,但不完全,我一直在试图找出原因
import pandas as pd
# Read the CSV file
file_path = 'input.csv'
df = pd.read_csv(file_path)
# Function to append unique timestamps based on ID length
def append_timestamp(row):
id_length = len(row['id'])
timestamps = []
# for i in range(id_length):
# timestamps.add(row['fix_timestamps'])
# return ';'.join(timestamps)
for i in range(id_length):
timestamps.append(row['fix_timestamps'])
return ';'.join(timestamps)
# Apply the function to the DataFrame rows
df['fix_timestamps'] = df.apply(append_timestamp, axis=1)
print(df)
# Save the DataFrame to a CSV file
output_file_path = 'output'
df.to_csv(output_file_path, index=False)
使用上面的输入示例并通过当前脚本运行它,我的输出基本上会生成以下内容:
修复时间戳 | id |
---|---|
2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;..... | WPA54 |
2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;.... | WPA54 |
2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;.... | WPA54 |
2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;..... | WPA54 |
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527.... | IBT675 |
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527.... | IBT675 |
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527.... | IBT675 |
因此,当我想为 id 的所有唯一时间执行此操作时,似乎只是在每个唯一时间重复时间戳并附加相同的值。
谢谢大家
您可以使用:
df['fix_timestamps'] = (df['fix_timestamps'].astype(str).groupby(df['id'])
.transform(lambda x: ';'.join(x.unique()))
)
输出:
fix_timestamps id
0 2023-08-01 00:02:52.527;2023-08-01 00:10:10.64... WPA54
1 2023-08-01 00:02:52.527;2023-08-01 00:10:10.64... WPA54
2 2023-08-01 00:02:52.527;2023-08-01 00:10:10.64... WPA54
3 2023-08-01 00:02:52.527;2023-08-01 00:10:10.64... WPA54
4 2023-08-01 00:02:52.527;2023-08-01 00:10:10.64... WPA54
5 2023-08-01 00:02:52.527;2023-08-01 00:10:10.64... WPA54
6 2023-08-01 00:02:52.527;2023-08-01 00:10:10.64... WPA54
7 2023-08-01 00:10:10.640;2023-08-01 00:10:26.93... IBT675
8 2023-08-01 00:10:10.640;2023-08-01 00:10:26.93... IBT675
9 2023-08-01 00:10:10.640;2023-08-01 00:10:26.93... IBT675
10 2023-08-01 00:10:10.640;2023-08-01 00:10:26.93... IBT675
11 2023-08-01 00:10:10.640;2023-08-01 00:10:26.93... IBT675
12 2023-08-01 00:10:10.640;2023-08-01 00:10:26.93... IBT675
13 2023-08-01 00:10:10.640;2023-08-01 00:10:26.93... IBT675