从数据集中在7AM-5PM的小时中筛选出M-F的更好方法?

问题描述 投票:1回答:1

[我正在尝试concat和大熊猫,试图从数据集中过滤掉周一至周五的周一至周五的上午7点至下午5点。因此,基本上剩下的唯一数据将是周末所有时间周末工作日晚上6点至凌晨6点

整理一些数据:

import numpy as np
import pandas as pd
np.random.seed(11)

rows,cols = 50000,2
data = np.random.rand(rows,cols) 
tidx = pd.date_range('2019-01-01', periods=rows, freq='H') 

df = pd.DataFrame(data, columns=['Temperature','Value'], index=tidx)

我唯一想到的就是创建3个副本:

df_weekend = df.copy()
df_weeknights_AM = df.copy()
df_weeknights_PM = df.copy()

过滤星期一至星期五以创建所有小时的周末数据集

df_weekend = df_weekend[
    (df_weekend.index.dayofweek > 4)
]

过滤夜晚和周末

df_weeknights_AM = df_weeknights_AM[
    (df_weeknights_AM.index.dayofweek < 5)
    &
    (df_weeknights_AM.index.strftime('%H').astype('int') < 7)
]

过滤掉早晨和周末

df_weeknights_PM = df_weeknights_PM[
    (df_weeknights_PM.index.dayofweek < 5)
    &
    (df_weeknights_PM.index.strftime('%H').astype('int') > 17)
]

然后尝试将所有数据集合并在一起。我正在尝试merge,但运气不是很好。

df2 = pd.concat([df_weekend, df_weeknights_AM], axis=1)

df3 = pd.concat([df2, df_weeknights_PM], axis=1)

问题是输出看起来有点奇怪,因为我希望没有创建重复的列,而是将所有内容基于具有相同的两个原始列的索引(时间戳)合并到一个数据集中。最佳实践??在尝试合并时,我得到了类似的重复标记为_x,_y ...]的列

df3


Temperature Value   Temperature Value   Temperature Value
2019-01-01 00:00:00 NaN NaN 0.180270    0.019475    NaN NaN
2019-01-01 01:00:00 NaN NaN 0.463219    0.724934    NaN NaN
2019-01-01 02:00:00 NaN NaN 0.420204    0.485427    NaN NaN
2019-01-01 03:00:00 NaN NaN 0.012781    0.487372    NaN NaN
2019-01-01 04:00:00 NaN NaN 0.941807    0.850795    NaN NaN
2019-01-01 05:00:00 NaN NaN 0.729964    0.108736    NaN NaN
2019-01-01 06:00:00 NaN NaN 0.893904    0.857154    NaN NaN
2019-01-01 18:00:00 NaN NaN NaN NaN 0.986673    0.338054
2019-01-01 19:00:00 NaN NaN NaN NaN 0.239875    0.796436
2019-01-01 20:00:00 NaN NaN NaN NaN 0.063686    0.364616
2019-01-01 21:00:00 NaN NaN NaN NaN 0.070023    0.319368
2019-01-01 22:00:00 NaN NaN NaN NaN 0.070383    0.290264
2019-01-01 23:00:00 NaN NaN NaN NaN 0.790101    0.905400
2019-01-02 00:00:00 NaN NaN 0.792621    0.561819    NaN NaN
2019-01-02 01:00:00 NaN NaN 0.616018    0.361484    NaN NaN
2019-01-02 02:00:00 NaN NaN 0.168817    0.436241    NaN NaN
2019-01-02 03:00:00 NaN NaN 0.732825    0.062888    NaN NaN
2019-01-02 04:00:00 NaN NaN 0.020733    0.770548    NaN NaN
2019-01-02 05:00:00 NaN NaN 0.299952    0.701164    NaN NaN
2019-01-02 06:00:00 NaN NaN 0.734668    0.932905    NaN NaN

我正在试验concat和pandas,试图从数据集中过滤掉星期一至星期五的周一至周五的上午7点至下午5点。因此,基本上剩下的唯一数据将是所有时间的周末...

python pandas
1个回答
0
投票

您可以使用DataFrame.between_time仅保留两个特定小时之间的行,而DataFrame.between_time仅保留numerical

© www.soinside.com 2019 - 2024. All rights reserved.