[我正在尝试concat
和大熊猫,试图从数据集中过滤掉周一至周五的周一至周五的上午7点至下午5点。因此,基本上剩下的唯一数据将是周末所有时间周末和工作日晚上6点至凌晨6点。
import numpy as np
import pandas as pd
np.random.seed(11)
rows,cols = 50000,2
data = np.random.rand(rows,cols)
tidx = pd.date_range('2019-01-01', periods=rows, freq='H')
df = pd.DataFrame(data, columns=['Temperature','Value'], index=tidx)
df_weekend = df.copy()
df_weeknights_AM = df.copy()
df_weeknights_PM = df.copy()
df_weekend = df_weekend[
(df_weekend.index.dayofweek > 4)
]
df_weeknights_AM = df_weeknights_AM[
(df_weeknights_AM.index.dayofweek < 5)
&
(df_weeknights_AM.index.strftime('%H').astype('int') < 7)
]
df_weeknights_PM = df_weeknights_PM[
(df_weeknights_PM.index.dayofweek < 5)
&
(df_weeknights_PM.index.strftime('%H').astype('int') > 17)
]
然后尝试将所有数据集合并在一起。我正在尝试merge
,但运气不是很好。
df2 = pd.concat([df_weekend, df_weeknights_AM], axis=1)
df3 = pd.concat([df2, df_weeknights_PM], axis=1)
问题是输出看起来有点奇怪,因为我希望没有创建重复的列,而是将所有内容基于具有相同的两个原始列的索引(时间戳)合并到一个数据集中。最佳实践??在尝试合并时,我得到了类似的重复标记为_x,_y ...]的列
df3
Temperature Value Temperature Value Temperature Value
2019-01-01 00:00:00 NaN NaN 0.180270 0.019475 NaN NaN
2019-01-01 01:00:00 NaN NaN 0.463219 0.724934 NaN NaN
2019-01-01 02:00:00 NaN NaN 0.420204 0.485427 NaN NaN
2019-01-01 03:00:00 NaN NaN 0.012781 0.487372 NaN NaN
2019-01-01 04:00:00 NaN NaN 0.941807 0.850795 NaN NaN
2019-01-01 05:00:00 NaN NaN 0.729964 0.108736 NaN NaN
2019-01-01 06:00:00 NaN NaN 0.893904 0.857154 NaN NaN
2019-01-01 18:00:00 NaN NaN NaN NaN 0.986673 0.338054
2019-01-01 19:00:00 NaN NaN NaN NaN 0.239875 0.796436
2019-01-01 20:00:00 NaN NaN NaN NaN 0.063686 0.364616
2019-01-01 21:00:00 NaN NaN NaN NaN 0.070023 0.319368
2019-01-01 22:00:00 NaN NaN NaN NaN 0.070383 0.290264
2019-01-01 23:00:00 NaN NaN NaN NaN 0.790101 0.905400
2019-01-02 00:00:00 NaN NaN 0.792621 0.561819 NaN NaN
2019-01-02 01:00:00 NaN NaN 0.616018 0.361484 NaN NaN
2019-01-02 02:00:00 NaN NaN 0.168817 0.436241 NaN NaN
2019-01-02 03:00:00 NaN NaN 0.732825 0.062888 NaN NaN
2019-01-02 04:00:00 NaN NaN 0.020733 0.770548 NaN NaN
2019-01-02 05:00:00 NaN NaN 0.299952 0.701164 NaN NaN
2019-01-02 06:00:00 NaN NaN 0.734668 0.932905 NaN NaN
我正在试验concat和pandas,试图从数据集中过滤掉星期一至星期五的周一至周五的上午7点至下午5点。因此,基本上剩下的唯一数据将是所有时间的周末...
您可以使用DataFrame.between_time
仅保留两个特定小时之间的行,而DataFrame.between_time
仅保留numerical