仅绘制时间序列中稳定的点并执行线性回归

问题描述 投票:0回答:1

我在一份帮助请求中提出了两个问题。所以我希望它不会让这里变得拥挤。

我花了相当多的时间来解决这个问题,但到目前为止还没有成功。我试图仅绘制一系列彼此接近的数据中的点,而不是转换(见下图)。可能我需要一个 if 条件,表示 if x(2)-x(1)<0.005 plot and if not do not plot. Later I want to do a linear regression on these points (thats why I want to exclude transitions). Can you please help me how to do the plotting with this condition and do linear regression.

enter image description here

这是我的代码:


# which value you want to use or plot reading from log data
desired_field1= "x y Box"
desired_value1 = "x1 [um]"
desired_field2= "x y Box"
desired_value2 =  "x3 [um]"
desired_field3= "LM Position"
desired_value3 =  "Z [um]"


# extracting desired data from logging
data=pd.read_excel(r"test2.xlsx", sheet_name='Sheet1')
data = data[(data['_time'] < '2024-05-21T09:49:37.6089875Z') & (data['_time'] > '2024-05-21T09:43:31.7141954Z')] #selecting desired time interval
data_measurement1 = data.loc[data['_measurement'] == desired_field1]
data_field1 = data_measurement1.loc[data['_field'] == desired_value1]
data_measurement2 = data.loc[data['_measurement'] == desired_field2]
data_field2 = data_measurement2.loc[data['_field'] == desired_value2]
data_measurement3 = data.loc[data['_measurement'] == desired_field3]
data_field3 = data_measurement3.loc[data['_field'] == desired_value3]
values1 = list(data_field1['_value']) #values we are interested in
values2 = list(data_field2['_value'])
values3 = list(data_field3['_value'])
#....

mean_xs = [(g + h) / 2 for g, h in zip(values1, values2)]
LM_mean = [50-x for x in mean_xs]

#start plotting
data_field1['_time'] = pd.to_datetime(data_field1['_time'].str.split().str[-1])
data_field2['_time'] = pd.to_datetime(data_field2['_time'].str.split().str[-1])
data_field3['_time'] = pd.to_datetime(data_field3['_time'].str.split().str[-1])
plt.plot(data_field1['_time'], values1, '-', label = desired_value1)
plt.plot(data_field2['_time'], values2, '-', label = desired_value2 )
plt.plot(data_field3['_time'], values3, '-', label = desired_value3)


plt.xlabel('time [D hh:mm]')
plt.ylabel(' x [um] MCS')
plt.legend(loc='best')
plt.gca().yaxis.grid(True)

plt.figure()
plt.plot(LM_mean, values3, 'o')

示例数据:

9988   2024-05-21T09:46:00.1164445Z  1294.005333
9989   2024-05-21T09:46:01.1115275Z  1294.005333
9990   2024-05-21T09:46:02.1254956Z  1294.005667
9991   2024-05-21T09:46:03.1191685Z  1294.005667
9992   2024-05-21T09:46:04.1325494Z  1294.005333
9993   2024-05-21T09:46:05.1268794Z  1294.005333
9994   2024-05-21T09:46:06.1409297Z  1294.005333
9995   2024-05-21T09:46:07.1346292Z  1294.005000
9996   2024-05-21T09:46:08.1488069Z  1294.005333
9997   2024-05-21T09:46:09.1417524Z  1294.005333
9998   2024-05-21T09:46:10.1563002Z  1294.005333
9999   2024-05-21T09:46:11.1692835Z  1294.005333
10000  2024-05-21T09:46:12.1642492Z  1332.747333
10001  2024-05-21T09:46:13.1977216Z  1344.011333
10002  2024-05-21T09:46:14.1926256Z  1344.012000
10003  2024-05-21T09:46:15.2062685Z  1344.011667
10004  2024-05-21T09:46:16.2200463Z  1344.011667
10005  2024-05-21T09:46:17.2339343Z  1344.012000
10006  2024-05-21T09:46:18.2479639Z  1344.012000
10007  2024-05-21T09:46:19.2405515Z  1344.012000
10008  2024-05-21T09:46:20.2556817Z  1344.012000

我尝试过寻找这个但没有成功

python matplotlib datetime scikit-learn time-series
1个回答
0
投票

您有一个非常好的信号,如果您将数字四舍五入为整数,则可以轻松检测到这些步骤。我最后确实生成了一个玩具数据集,因为您的示例只有一个步骤,因此它不太健壮。我现在只包括该方法:

diffY = np.diff(y) # get the gradiant of Y
idxSteps = np.array(np.where(np.abs(diffY) > 1)) # search for drops as the abs of the difference
previousStep = 0 # init the previous step
xMid = list() # init list for x
yMid = list() # ==/== for y
for currentStep in idxSteps[0]: # for every step detected
    if currentStep != previousStep: # and if they are not the same as previousStep
        dummyX = x[previousStep:currentStep] # get the data between these two steps
        dummyY = y[previousStep:currentStep] # ==/==
        xMid.append(np.median(dummyX)) # append to the median to the x list
        yMid.append(np.min(dummyY)) # append the minimum to the y
        previousStep = currentStep # assign previousStep as currentStep for the next loop

在上面的代码中:

  • 数据的差异被视为跳跃的指标
  • 位置用于定位跳转之间的数据,以便能够生成中间点
  • 对每个峰重复此操作

应用此代码后的结果如下:

plt.figure()
plt.plot(x, y) # plot original
plt.scatter(xMid, yMid, 10, color = "r", marker = "v") # scatter generated points
fit = np.polyfit(xMid, yMid, 1) # fit
plt.plot(x, np.polyval(fit, x), "k--") # plot fit
plt.grid() # apply grid
plt.legend(["Raw", "Middle", "Fitted"])

results

尝试应用这种方法,如果您有更多问题,请回来。希望这对您有帮助。

到目前为止导入的是 matplotlib 和 numpy。

© www.soinside.com 2019 - 2024. All rights reserved.