每天特定范围内的时间序列百分比

问题描述 投票:-1回答:1

我有一个大的时间序列数据集,可以测量温度随时间的变化。每行都有一个日期时间和相应的温度。我想弄清楚它在特定温度范围内的时间百分比。

我想通过这个数据框,每天计算10到20度之间的温度百分比。这应该会产生一个新的数据框,每天都有一个设备在范围内的百分比。关键是要看看范围内的百分比如何按天变化,而不是仅计算整个数据框的范围百分比。

我怎样才能以比我尝试过的更有效的方式实现这一目标?

df1 = df[(df['date'] > '2019-01-01') & (df['date'] <= '2019-01-02')]
df2 = df[(df['date'] > '2019-01-02') & (df['date'] <= '2019-01-03')]
df3 = df[(df['date'] > '2019-01-03') & (df['date'] <= '2019-01-04')]
df4 = df[(df['date'] > '2019-01-04') & (df['date'] <= '2019-01-05')]
df5 = df[(df['date'] > '2019-01-05') & (df['date'] <= '2019-01-06')]
df6 = df[(df['date'] > '2019-01-06') & (df['date'] <= '2019-01-07')]
df7 = df[(df['date'] > '2019-01-07') & (df['date'] <= '2019-01-08')]

condition1 = df1[(df1.temp >= 10.0) & (df1.temp <=20.0)]
condition2 = df2[(df2.temp >= 10.0) & (df2.temp <=20.0)]
condition3 = df3[(df3.temp >= 10.0) & (df3.temp <=20.0)]
condition4 = df4[(df4.temp >= 10.0) & (df4.temp <=20.0)]
condition5 = df5[(df5.temp >= 10.0) & (df5.temp <=20.0)]
condition6 = df6[(df6.temp >= 10.0) & (df6.temp <=20.0)]
condition7 = df7[(df7.temp >= 10.0) & (df7.temp <=20.0)]

percentage1 = (len(condition1)/len(df1))*100
percentage2 = (len(condition2)/len(df2))*100
percentage3 = (len(condition3)/len(df3))*100
percentage4 = (len(condition4)/len(df4))*100
percentage5 = (len(condition5)/len(df5))*100
percentage6 = (len(condition6)/len(df6))*100
percentage7 = (len(condition7)/len(df7))*100
python pandas dataframe
1个回答
0
投票

假设您有相同的采样数据,您可以尝试这样做:

df2 = df[(df['temperature']>10)&(df['temperature']<20)]['temperature'].resample('1d').count().divide(df['temperature'].resample('1d').count())

0
投票

这样的事情对你有用:

df['date']=pd.to_datetime(df['date']) #not necessary if your dates are already in datetime format
df.set_index('date',inplace=True) #make date the index

all_days=df.index.normalize().unique() #get all unique days in timeseries

df2=pd.DataFrame(columns=['date','percent']) #create new df to store results
df2['date']=all_days #make date column equal to the unique days
df2.set_index('date',inplace=True) #make date column the index

for i,row in df2.iterrows(): #iterate through each row of df2
    iloc = df2.index.get_loc(i) #get index location
    daily_df = df[(df.index >= df2.index[iloc]) & (df.index < df2.index[iloc+1])] #get reduced df for that day (assuming it starts at midnight and ends at 23:59:59)
    total_count = daily_df.shape[0] #number of temp readings that day
    above_count = daily_df[(daily_df['temp'] >= 10) & (daily_df['temp'] <= 20)].values.shape[0] #number of temp readings between 10 and 20
    df2.iloc[iloc]['percent']=100*above_count/total_count #assign percent column the percentage of values between 10 and 20 

肯定有一种方法可以用我不知道的pandas函数来简化代码。但这是一个好的开始

你必须处理最后一天,因为它没有一个有限的结束日

编辑

将daily_df行替换为:

daily_df = df[df.index.normalize() == df2.index[iloc]]

并且不会在最后一天崩溃

© www.soinside.com 2019 - 2024. All rights reserved.