例如,考虑
df:
时间 colA colB 0 1.1 2 2 1 2.2 2 2 2 3.4 3 5 3 4.5 3 5 4 5.6 4 5 5 6.2 4 6 6 7.4 4 6 7 8.5 2 6 8 9.8 2 5 9 10.1 2 5 10 11.2 2 5
我期望的输出是一个报告 CSV 文件,其中的列如下:
Col_name unique_value 持续时间
colA 2 3.8s 可乐3 1.1s 可乐 4 1.8s 列B 2 1.1s 科尔B 5 3.6s 第 6 列 2.3 秒
(例如):计算 colA :
独特价值=2 时长 = [第1次连续出现2个时间差(2.2-1.1)] + [第2次连续出现时间差(11.2-8.5)] = 1.1 + 2.7 = 3.8s
我尝试过的逻辑之一是:
df["answer"] = df['colA'].diff().eq(0)
下一步,我计划在一个列表中获取所有 False,在一个列表中获取所有 True,并获取列表的差异。
如何将这些与独特的价值联系起来,是我困惑的地方。
请帮我弄清楚现有的逻辑是否有效,或者是否应该更改逻辑
*创建一个新列来指示具有 True 和 False 的连续值是一个好的开始。但是,要计算每列中每个唯一值的持续时间,您可以使用以下步骤: 迭代每列中的每个唯一值。 对于每个唯一值,找到连续出现的次数并计算持续时间。 将结果存储在新的数据框中。
import pandas as pd
# Sample DataFrame
data = {
'Time': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'colA': [1.1, 2.2, 3.4, 4.5, 5.6, 6.2, 7.4, 8.5, 9.8, 10.1, 11.2],
'colB': [2, 2, 3, 3, 4, 4, 4, 2, 2, 2, 2]
}
df = pd.DataFrame(data)
# Function to calculate duration for each unique value in a column
def calculate_duration(column_name):
durations = []
unique_values = df[column_name].unique()
for value in unique_values:
# Find consecutive occurrences of the value
consecutive_indices = df[df[column_name] == value].index.to_list()
consecutive_occurrences = []
current_occurrence = [consecutive_indices[0]]
for i in range(1, len(consecutive_indices)):
if consecutive_indices[i] - consecutive_indices[i-1] == 1:
current_occurrence.append(consecutive_indices[i])
else:
consecutive_occurrences.append(current_occurrence)
current_occurrence = [consecutive_indices[i]]
consecutive_occurrences.append(current_occurrence)
# Calculate duration for each consecutive occurrence
for occurrence in consecutive_occurrences:
start_time = df.iloc[occurrence[0]]['Time']
end_time = df.iloc[occurrence[-1]]['Time']
duration = end_time - start_time
durations.append((value, duration))
return durations
# Create a DataFrame to store results
report_df = pd.DataFrame(columns=['Col_name', 'unique_value', 'Duration'])
# Calculate durations for each column
for column in df.columns[1:]:
durations = calculate_duration(column)
for value, duration in durations:
report_df = report_df.append({'Col_name': column, 'unique_value': value, 'Duration': duration}, ignore_index=True)
# Export to CSV
report_df.to_csv('report.csv', index=False)
将数据框导出到 CSV 文件。*