潘达斯重新取样不能使用mean()方法。

问题描述 投票:0回答:1

我正在重新取样一个12天的频率时间序列。我想通过按月分组的方式将其重新取样为一个月的频率序列。当我按总和和计数重新取样时,效果很好,但按平均值重新取样就不行了。

这是我正在使用的代码。

date = ['09/03/2015','02/04/2015','26/04/2015','08/05/2015','20/05/2015',
'01/06/2015','13/06/2015','25/06/2015','07/07/2015','31/07/2015','12/08/2015',
'24/08/2015','23/10/2015','04/11/2015','16/11/2015','28/11/2015','22/12/2015']

values = [4.2e-05,-0.003414,0.016886,0.010597,-0.015756,-0.011592,
-0.018709,-0.031948,-0.000361,0.033206,0.122711,0.092198,0.067306,0.000668,
-0.057302,-0.052964,-0.076545]

df = pd.DataFrame([date,values]).T    # If not transposed it's not well organized
df.columns = ['Date','Values']
df.Date = df.Date.map(lambda x: pd.to_datetime(x,dayfirst=True)) 
df.reset_index()
df = df.set_index(['Date'])
df.resample('M').mean()

时间数据是DateTime格式,而时间序列值是浮动的。

即使如此,这也是不断出现的错误。

df.resample('M').mean()

File "C:\WPy64-3760\python-3.7.6.amd64\lib\site-packages\pandas\core\groupby\generic.py", line 188, in _cython_agg_blocks
    raise DataError("No numeric types to aggregate")

DataError: No numeric types to aggregate

重要的是,并不是所有月份的时间序列都包含一个以上的值。更有甚者,有些月份可能没有数据。我想这不会引起麻烦。顺便说一下,我使用的是Pandas 0.25.3版本。

我不知道发生了什么。

python pandas mean
1个回答
0
投票
  • 当数据框是用 pd.DataFrame([date,values]).T,列都被认作对象。栏中的 Values 类型永远不会被设置为 float.
import pandas as pd

# data
date = ['09/03/2015','02/04/2015','26/04/2015','08/05/2015','20/05/2015',
        '01/06/2015','13/06/2015','25/06/2015','07/07/2015','31/07/2015','12/08/2015',
        '24/08/2015','23/10/2015','04/11/2015','16/11/2015','28/11/2015','22/12/2015']

values = [4.2e-05,-0.003414,0.016886,0.010597,-0.015756,-0.011592,
          -0.018709,-0.031948,-0.000361,0.033206,0.122711,0.092198,0.067306,0.000668,
          -0.057302,-0.052964,-0.076545]

# create dataframe
# Values is properly recognized as a float 
df = pd.DataFrame({'Date': date, 'Values': values})

# Convert Date to a datetime and set as the index
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df = df.set_index(['Date'])

# resample
df.resample('M').mean()
  • 数据框每个月只有一个值,所以重采样不会发生任何事情;在一个月中必须有多个值。
    • 如果每个月只有一个值,就不会出现错误。

每月重新取样的工作示例

import pandas as pd
import numpy as np
from datetime import datetime

# data
np.random.seed(365)
data = {'a': [np.random.randint(10) for _ in range(40)],
        'b': [np.random.randint(10) for _ in range(40)],
        'c': [np.random.randint(10) for _ in range(40)],
        'd': [np.random.randint(10) for _ in range(40)],
        'e': [np.random.randint(10) for _ in range(40)],
        'date': pd.bdate_range(datetime.today(), freq='w', periods=40).tolist()}

# dataframe
df = pd.DataFrame(data)

# set index
df.set_index('date', inplace=True)

print(df.head())

            a  b  c  d  e
date                     
2020-05-17  2  1  6  8  6
2020-05-24  4  4  5  9  1
2020-05-31  1  0  7  9  5
2020-06-07  5  9  7  7  7
2020-06-14  2  6  9  5  6

# resample
df.resample('M').mean()

                   a         b     c         d     e
date                                                
2020-05-31  2.333333  1.666667  6.00  8.666667  4.00
2020-06-30  4.500000  6.500000  6.25  3.500000  6.00
2020-07-31  3.750000  4.750000  2.25  3.500000  5.75
2020-08-31  4.800000  6.000000  2.00  3.800000  4.00
2020-09-30  4.250000  4.500000  3.00  4.750000  6.75
2020-10-31  5.500000  3.500000  5.00  6.750000  7.25
2020-11-30  5.400000  6.600000  5.60  5.200000  4.20
2020-12-31  6.250000  6.750000  5.75  4.500000  3.25
2021-01-31  7.200000  3.200000  3.20  5.200000  4.20
2021-02-28  4.500000  3.500000  2.50  5.500000  3.50
© www.soinside.com 2019 - 2024. All rights reserved.