在Python中,我有一个pandas时间戳列表的列表,例如考虑以下输入提示
input: list[list[pd.Timestamp]]
import pandas as pd
import numpy as np
# Example list of lists of Pandas Timestamps
input = [[pd.Timestamp('2023-09-01 10:00:00'), pd.Timestamp('2023-09-01 11:00:00')],
[pd.Timestamp('2023-09-02 12:00:00'), pd.Timestamp('2023-09-02 13:00:00')],
[pd.Timestamp('2023-09-03 14:00:00'), pd.Timestamp('2023-09-03 15:00:00')]]
此输入表示日期时间范围的列表。 为了调试和简单起见,我更喜欢将此信息视为数据框:
input_df = pd.DataFrame(input, columns=[['left', 'right']])
现在我需要:
strftime("%Y/%m/%d/%H")
)现在,我应该在 pandas、numpy(因为所有元素都是同一类型)还是 Python 中执行此操作? 什么是最快、更优雅的方式?
您可以使用:
#remove nested list in columns names for avoid MultiIndex
input_df = pd.DataFrame(input, columns=['left', 'right'])
#substract columns and convert to hours difference
repeat = input_df['right'].sub(input_df['left']).dt.total_seconds().div(3600).add(1)
#repeat left column
s = input_df.loc[input_df.index.repeat(repeat), 'left']
#add counter by hours, remove dupes and convert to custom format
out = (s.add(pd.to_timedelta(s.groupby(s).cumcount(), unit='H')).drop_duplicates()
.dt.strftime("%Y/%m/%d/%H").tolist())
print (out)
['2023/09/01/10', '2023/09/01/11', '2023/09/02/12',
'2023/09/02/13', '2023/09/03/14', '2023/09/03/15']
使用 pandas 可以高效地达到想要的结果
import pandas as pd
# Example list of lists of Pandas Timestamps
input_data = [
[pd.Timestamp('2023-09-01 10:00:00'), pd.Timestamp('2023-09-01 11:00:00')],
[pd.Timestamp('2023-09-02 12:00:00'), pd.Timestamp('2023-09-02 13:00:00')],
[pd.Timestamp('2023-09-03 14:00:00'), pd.Timestamp('2023-09-03 15:00:00')]
]
# Create a DataFrame
input_df = pd.DataFrame(input_data, columns=['left', 'right'])
# Step 1: Flatten the input
flattened_df = input_df.stack().reset_index(drop=True)
# Step 2: Remove duplicates
unique_df = flattened_df.drop_duplicates()
# Step 3: Apply the same string format
unique_df['formatted'] = unique_df['right'].dt.strftime("%Y/%m/%d/%H")
# Resulting DataFrame with flattened, unique, and formatted timestamps
print(unique_df)
`