分隔重复列值

问题描述 投票:2回答:2

这可能以前曾被问过...

我有很长的文件记录值,例如时间,温度等,这取决于输入功能所处的某个步骤。“步骤”列记录了当前步骤,在这种情况下为1到20个步骤。这些步骤可以重复如下(数据的一小部分):

import pandas as pd

raw_data = [{'Date': '1-10-19', 'Read': 1.1, 'Step': 1},
            {'Date': '2-10-19', 'Read': 1.11, 'Step': 1},
            {'Date': '3-10-19', 'Read': 10.1, 'Step': 2},
            {'Date': '4-10-19', 'Read': 10.11, 'Step': 2},
            {'Date': '5-10-19', 'Read': 1.2, 'Step': 1},
            {'Date': '6-10-19', 'Read': 1.21, 'Step': 1},
            {'Date': '7-10-19', 'Read': 10.2, 'Step': 2},
            {'Date': '8-10-19', 'Read': 10.21, 'Step': 2}]

df = pd.DataFrame(raw_data)
      Date   Read  Step
0  1-10-19   1.10     1
1  2-10-19   1.11     1
2  3-10-19  10.10     2
3  4-10-19  10.11     2
4  5-10-19   1.20     1
5  6-10-19   1.21     1
6  7-10-19  10.20     2
7  8-10-19  10.21     2

我需要跟踪一段时间内的步骤序列组,如下所示:

      Date   Read  Step   Step_New
0  1-10-19   1.10     1   1.1
1  2-10-19   1.11     1   1.1
2  3-10-19  10.10     2   2.1
3  4-10-19  10.11     2   2.1
4  5-10-19   1.20     1   1.2
5  6-10-19   1.21     1   1.2
6  7-10-19  10.20     2   2.2
7  8-10-19  10.21     2   2.2

我应该如何添加此新列?最终,我将groupby此列用于对新的单个步骤执行一些统计。

python pandas dataframe
2个回答
2
投票

您可以借助记忆来记住每个特定步骤的哪一步。然后,使用apply可以创建一个新列。

import pandas as pd

raw_data = [{'Date': '1-10-19', 'Read': 1.1, 'Step': 1},
            {'Date': '2-10-19', 'Read': 1.11, 'Step': 1},
            {'Date': '3-10-19', 'Read': 10.1, 'Step': 2},
            {'Date': '4-10-19', 'Read': 10.11, 'Step': 2},
            {'Date': '5-10-19', 'Read': 1.2, 'Step': 1},
            {'Date': '6-10-19', 'Read': 1.21, 'Step': 1},
            {'Date': '7-10-19', 'Read': 10.2, 'Step': 2},
            {'Date': '8-10-19', 'Read': 10.21, 'Step': 2}]

df = pd.DataFrame(raw_data)

step_memory = {}
last_step = -1

def calculate_new_step(row):
    global last_step
    step = row['Step']
    output = str(step) + "."
    if step == last_step:
        output += str(step_memory[step])
    else:
        last_step = step
        step_memory[step] = step_memory.get(step, 0) + 1
        output += str(step_memory[step])
    return float(output) #if you want it as a String delete float

df['Step_New'] = df.apply(calculate_new_step, axis=1)
print(df)

输出:

      Date   Read  Step  Step_New
0  1-10-19   1.10     1       1.1
1  2-10-19   1.11     1       1.1
2  3-10-19  10.10     2       2.1
3  4-10-19  10.11     2       2.1
4  5-10-19   1.20     1       1.2
5  6-10-19   1.21     1       1.2
6  7-10-19  10.20     2       2.2
7  8-10-19  10.21     2       2.2

2
投票

这里是跟踪每个步骤组的小数步的替代方法:

       Date    Read  Step
0   1-10-19    1.10     1
1   2-10-19    1.11     1
2   3-10-19   10.10     2
3   4-10-19   10.11     2
4   5-10-19    1.20     1
5   6-10-19    1.21     1
6   7-10-19   10.20     2
7   8-10-19   10.21     2
8   8-10-19  100.10     3
9   8-10-19  100.11     3
10  6-10-19    1.22     1
11  6-10-19    1.31     1

df["Step_New"] = df.Step + df.groupby('Step')['Read'].apply(lambda x: round(x - x.astype(int), 1))                                                                                                 

输出:

       Date    Read  Step  Step_New
0   1-10-19    1.10     1       1.1
1   2-10-19    1.11     1       1.1
2   3-10-19   10.10     2       2.1
3   4-10-19   10.11     2       2.1
4   5-10-19    1.20     1       1.2
5   6-10-19    1.21     1       1.2
6   7-10-19   10.20     2       2.2
7   8-10-19   10.21     2       2.2
8   8-10-19  100.10     3       3.1
9   8-10-19  100.11     3       3.1
10  6-10-19    1.22     1       1.2
11  6-10-19    1.31     1       1.3
© www.soinside.com 2019 - 2024. All rights reserved.