这可能以前曾被问过...
我有很长的文件记录值,例如时间,温度等,这取决于输入功能所处的某个步骤。“步骤”列记录了当前步骤,在这种情况下为1到20个步骤。这些步骤可以重复如下(数据的一小部分):
import pandas as pd
raw_data = [{'Date': '1-10-19', 'Read': 1.1, 'Step': 1},
{'Date': '2-10-19', 'Read': 1.11, 'Step': 1},
{'Date': '3-10-19', 'Read': 10.1, 'Step': 2},
{'Date': '4-10-19', 'Read': 10.11, 'Step': 2},
{'Date': '5-10-19', 'Read': 1.2, 'Step': 1},
{'Date': '6-10-19', 'Read': 1.21, 'Step': 1},
{'Date': '7-10-19', 'Read': 10.2, 'Step': 2},
{'Date': '8-10-19', 'Read': 10.21, 'Step': 2}]
df = pd.DataFrame(raw_data)
Date Read Step
0 1-10-19 1.10 1
1 2-10-19 1.11 1
2 3-10-19 10.10 2
3 4-10-19 10.11 2
4 5-10-19 1.20 1
5 6-10-19 1.21 1
6 7-10-19 10.20 2
7 8-10-19 10.21 2
我需要跟踪一段时间内的步骤序列组,如下所示:
Date Read Step Step_New
0 1-10-19 1.10 1 1.1
1 2-10-19 1.11 1 1.1
2 3-10-19 10.10 2 2.1
3 4-10-19 10.11 2 2.1
4 5-10-19 1.20 1 1.2
5 6-10-19 1.21 1 1.2
6 7-10-19 10.20 2 2.2
7 8-10-19 10.21 2 2.2
我应该如何添加此新列?最终,我将groupby
此列用于对新的单个步骤执行一些统计。
您可以借助记忆来记住每个特定步骤的哪一步。然后,使用apply可以创建一个新列。
import pandas as pd
raw_data = [{'Date': '1-10-19', 'Read': 1.1, 'Step': 1},
{'Date': '2-10-19', 'Read': 1.11, 'Step': 1},
{'Date': '3-10-19', 'Read': 10.1, 'Step': 2},
{'Date': '4-10-19', 'Read': 10.11, 'Step': 2},
{'Date': '5-10-19', 'Read': 1.2, 'Step': 1},
{'Date': '6-10-19', 'Read': 1.21, 'Step': 1},
{'Date': '7-10-19', 'Read': 10.2, 'Step': 2},
{'Date': '8-10-19', 'Read': 10.21, 'Step': 2}]
df = pd.DataFrame(raw_data)
step_memory = {}
last_step = -1
def calculate_new_step(row):
global last_step
step = row['Step']
output = str(step) + "."
if step == last_step:
output += str(step_memory[step])
else:
last_step = step
step_memory[step] = step_memory.get(step, 0) + 1
output += str(step_memory[step])
return float(output) #if you want it as a String delete float
df['Step_New'] = df.apply(calculate_new_step, axis=1)
print(df)
Date Read Step Step_New
0 1-10-19 1.10 1 1.1
1 2-10-19 1.11 1 1.1
2 3-10-19 10.10 2 2.1
3 4-10-19 10.11 2 2.1
4 5-10-19 1.20 1 1.2
5 6-10-19 1.21 1 1.2
6 7-10-19 10.20 2 2.2
7 8-10-19 10.21 2 2.2
这里是跟踪每个步骤组的小数步的替代方法:
Date Read Step
0 1-10-19 1.10 1
1 2-10-19 1.11 1
2 3-10-19 10.10 2
3 4-10-19 10.11 2
4 5-10-19 1.20 1
5 6-10-19 1.21 1
6 7-10-19 10.20 2
7 8-10-19 10.21 2
8 8-10-19 100.10 3
9 8-10-19 100.11 3
10 6-10-19 1.22 1
11 6-10-19 1.31 1
df["Step_New"] = df.Step + df.groupby('Step')['Read'].apply(lambda x: round(x - x.astype(int), 1))
输出:
Date Read Step Step_New
0 1-10-19 1.10 1 1.1
1 2-10-19 1.11 1 1.1
2 3-10-19 10.10 2 2.1
3 4-10-19 10.11 2 2.1
4 5-10-19 1.20 1 1.2
5 6-10-19 1.21 1 1.2
6 7-10-19 10.20 2 2.2
7 8-10-19 10.21 2 2.2
8 8-10-19 100.10 3 3.1
9 8-10-19 100.11 3 3.1
10 6-10-19 1.22 1 1.2
11 6-10-19 1.31 1 1.3