我有两个数据框,一个名为传感器,一个名为火车。 传感器数据帧包含时间序列的数据,其索引为 ts_sensor 列。在传感器数据框中,列的名称标识传感器的名称,每个单元格填充传感器在 ts_sensor 时刻假定的值。训练数据帧包含一个 id_sensor 列,其中填充了传感器的名称(如传感器数据帧中所示)和 boot_threshold 列,其中每个单元格填充了每个传感器的阈值。
这是与上述数据框相关的信息。
train.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 264 entries, 0 to 263
Data columns (total 17 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id_train 264 non-null object
1 asset_id 264 non-null object
2 train_type 264 non-null object
3 equipment_type 264 non-null object
4 id_sensor 264 non-null object
5 unit 262 non-null object
6 is_operating 262 non-null float64
7 will_be_available 262 non-null float64
8 pi_acquisition_date 0 non-null float64
9 pi_description 264 non-null object
10 sensor_description 264 non-null object
11 tag_id 262 non-null float64
12 priority 262 non-null float64
13 for_ml 264 non-null int64
14 onoff_parameter 264 non-null int64
15 boot_threshold 264 non-null int64
16 efficiency_index 264 non-null int64
dtypes: float64(5), int64(4), object(8)
memory usage: 35.2+ KB
sensor.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1729930 entries, 0 to 1729929
Columns: 135 entries, level_0 to Australia.Blacktip.ICSS.KPI.EE.GasTurbineEfficiency_A
dtypes: float64(132), int64(2), object(1)
memory usage: 1.7+ GB
sensor.columns
Index(['level_0', 'index', 'sensor_ts',
'Australia.Blacktip.ICSS.360140413-100.PV',
'Australia.Blacktip.ICSS.360140415-100.PV',
'Australia.Blacktip.ICSS.360140417-100.PV',
'Australia.Blacktip.ICSS.360140421-100.PV',
'Australia.Blacktip.ICSS.360140423-100.PV',
'Australia.Blacktip.ICSS.360140425-100.PV',
'Australia.Blacktip.ICSS.360140433-100.PV',
...
'Australia.Blacktip.ICSS.3601PDI144.DACA.PV',
'Australia.Blacktip.ICSS.3601PIT111.PV',
'Australia.Blacktip.ICSS.3601PIT141.PV',
'Australia.Blacktip.ICSS.3601TIT042.PV',
'Australia.Blacktip.ICSS.3601TIT052.PV',
'Australia.Blacktip.ICSS.3601TIT111.PV',
'Australia.Blacktip.ICSS.3601TIT122.PV',
'Australia.Blacktip.ICSS.3601TIT141.PV',
'Australia.Blacktip.ICSS.3601TIT152.PV',
'Australia.Blacktip.ICSS.KPI.EE.GasTurbineEfficiency_A'],
dtype='object', length=135)
我想获得一个新的数据帧,其中时间序列受到尊重,并且仅存在 ts_sensor 值当前高于相应 boot_threshold 值的传感器。
谢谢大家
你在找那个吗?
import pandas as pd
import numpy as np
# Simulated sensor data (a small part of the actual data for demonstration)
sensor_data = {
'sensor_ts': pd.date_range(start='2023-01-01', periods=4, freq='D'),
'Australia.Blacktip.ICSS.360140413-100.PV': [100, 200, 300, 400],
'Australia.Blacktip.ICSS.360140415-100.PV': [100, 220, 330, 440],
}
sensor = pd.DataFrame(sensor_data)
# Simulated train data
train_data = {
'id_sensor': ['Australia.Blacktip.ICSS.360140413-100.PV', 'Australia.Blacktip.ICSS.360140415-100.PV'],
'boot_threshold': [250, 300],
}
train = pd.DataFrame(train_data)
# Reshape sensor DataFrame from wide to long format
sensor_long = pd.melt(sensor, id_vars=['sensor_ts'], var_name='id_sensor', value_name='value')
# Merge the reshaped sensor data with the train data
merged_data = pd.merge(sensor_long, train, on='id_sensor')
# Filter to include only rows where sensor value is greater than the boot_threshold
filtered_data = merged_data[merged_data['value'] > merged_data['boot_threshold']]
# Displaying the filtered data for verification
filtered_data
输出:
sensor_ts id_sensor value boot_threshold
2 2023-01-03 Australia.Blacktip.ICSS.360140413-100.PV 300 250
3 2023-01-04 Australia.Blacktip.ICSS.360140413-100.PV 400 250
6 2023-01-03 Australia.Blacktip.ICSS.360140415-100.PV 330 300
7 2023-01-04 Australia.Blacktip.ICSS.360140415-100.PV 440 300