我有以下数据框:
import numpy as np
import pandas as pd
df = pd.DataFrame([])
df['Date'] = ['2020-01-01','2020-01-02','2020-01-03','2020-01-04','2020-01-05',
'2020-01-06','2020-01-07','2020-01-08','2020-01-09','2020-01-10',
'2020-01-11','2020-01-12','2020-01-13','2020-01-14','2020-01-15',
'2020-01-16','2020-01-17','2020-01-18','2020-01-19','2020-01-20']
df['Machine'] = ['A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A']
df['Signal'] = [0,1,2,0,1,3,0,0,0,3,0,1,0,0,3,0,1,0,0,1]
df['Status'] = 0
以下函数为机器 A 生成“状态”列。在信号列中,1 打开机器(状态列 1),该值保持为 1,直到机器收到 2 或 3(这是开关机器的信号)状态变为 0(关闭),直到机器再次收到信号 1。
我已经使用以下函数解决了维持先前状态行值为 1 或 0 的问题:
def s_gen(dataset, Signal):
_status = 0
status0 = []
for (i) in Signal:
if _status == 0:
if i == 1:
_status = 1
elif _status == 1:
if (i == 2 or i==3):
_status = 0
status0.append(_status)
dataset['status0'] = status0
return dataset['status0']
df['Status'] = s_gen(df,df['Signal'])
df.drop('status0',axis=1,inplace = True)
df
这会将新创建的列附加到数据框中。然而,我有一个更大的数据框,机器列中有许多不同的值(分组为系列;A、A、A、B、B、B 等),并且函数的结果不能重叠。使用 groupby 不起作用。因此,我认为下一步是将每个“状态”序列生成为单独的列表,并将它们连接起来,然后将整个系列作为更大的外循环的一部分附加到更大的数据帧中。
这是期望的结果:
df = pd.DataFrame([])
df['Date'] = ['2020-01-01','2020-01-02','2020-01-03','2020-01-04','2020-01-05',
'2020-01-06','2020-01-07','2020-01-08','2020-01-09','2020-01-10',
'2020-01-11','2020-01-12','2020-01-13','2020-01-14','2020-01-15',
'2020-01-16','2020-01-17','2020-01-18','2020-01-19','2020-01-20',
'2020-01-01','2020-01-02','2020-01-03','2020-01-04','2020-01-05',
'2020-01-06','2020-01-07','2020-01-08','2020-01-09','2020-01-10',
'2020-01-11','2020-01-12','2020-01-13','2020-01-14','2020-01-15',
'2020-01-16','2020-01-17','2020-01-18','2020-01-19','2020-01-20']
df['Machine'] = ['A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A',
'B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B',]
df['Signal'] = [0,1,2,0,1,3,0,0,0,3,0,1,0,0,3,0,1,0,0,1,0,1,2,0,1,3,0,0,0,3,0,1,0,0,3,0,1,0,0,1]
df['Status'] = [0,1,0,0,1,0,0,0,0,0,0,1,1,1,0,0,1,1,1,1,0,1,0,0,1,0,0,0,0,0,0,1,1,1,0,0,1,1,1,1]
df
我正在努力解决的是,如果该函数单独处理每台机器的数据然后将其附加到数据帧,则它必须循环遍历每台机器,然后连接生成的所有状态系列,然后将较大的系列附加到数据帧。
这是我迄今为止尝试过的:
dfList = df[df['Machine']]
dfListU = pd.DataFrame([])
dfListU = dfList['Machine'].unique()
dfListU.flatten()
def s_gen2(item, dataset, Signal):
data = df[df.Machine==m]
for m in dfListU:
_status = 0
status0 = []
for (i) in Signal:
if _status == 0:
if i == 1:
_status = 1
elif _status == 1:
if (i == 2 or i==3):
_status = 0
#status0.append(_status)
dataset['status0'] = status0
return dataset['status0']
for i in dfListU:
df1 = pd.concat(i)
status0.append(_status)
df['Status'] = s_gen(df,df['Signal'])
df.drop('status0',axis=1,inplace = True)
df
这会导致错误 - KeyError: "None of [Index(['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A '、'A'、'A'、'A'、'A'、'A'、 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B', 'B '、'B'、 'B','B','B','B','B','B','B','B','B','B','B','B'], dtype='object')] 在 [列]"
通过 dfListU(唯一机器列表)循环该函数然后连接结果是否更好?我尝试避免使用循环,但找不到任何其他方法来将先前的状态行与信号列中的同一行进行比较。
真诚感谢任何帮助。
s_gen2 块中有一些令人困惑的行。我怀疑它无法编译。例如,在下面的行中,m 在赋值之前使用。
data = df[df.Machine==m]
for m in dfListU:
无论如何,既然你的机器列表已经分组,s_gen 就可以重用,只需进行调整即可保持数据帧不变。
df = pd.DataFrame([])
df['Date'] = ['2020-01-01','2020-01-02','2020-01-03','2020-01-04','2020-01-05',
'2020-01-06','2020-01-07','2020-01-08','2020-01-09','2020-01-10',
'2020-01-11','2020-01-12','2020-01-13','2020-01-14','2020-01-15',
'2020-01-16','2020-01-17','2020-01-18','2020-01-19','2020-01-20',
'2020-01-01','2020-01-02','2020-01-03','2020-01-04','2020-01-05',
'2020-01-06','2020-01-07','2020-01-08','2020-01-09','2020-01-10',
'2020-01-11','2020-01-12','2020-01-13','2020-01-14','2020-01-15',
'2020-01-16','2020-01-17','2020-01-18','2020-01-19','2020-01-20']
df['Machine'] = ['A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A',
'B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B',]
df['Signal'] = [0,1,2,0,1,3,0,0,0,3,0,1,0,0,3,0,1,0,0,1,0,1,2,0,1,3,0,0,0,3,0,1,0,0,3,0,1,0,0,1]
def s_gen(Signal):
_status = 0
status0 = []
for (i) in Signal:
if _status == 0:
if i == 1:
_status = 1
elif _status == 1:
if (i == 2 or i==3):
_status = 0
status0.append(_status)
return status0
unique_machines = df['Machine'].unique()
whole_status_list = []
for m in unique_machines:
data = df[df.Machine==m]
whole_status_list.extend(s_gen(data["Signal"]))
df["Status"] = whole_status_list
上面的代码应该会有帮助。