我正在重新取样一个12天的频率时间序列。我想通过按月分组的方式将其重新取样为一个月的频率序列。当我按总和和计数重新取样时,效果很好,但按平均值重新取样就不行了。
这是我正在使用的代码。
date = ['09/03/2015','02/04/2015','26/04/2015','08/05/2015','20/05/2015',
'01/06/2015','13/06/2015','25/06/2015','07/07/2015','31/07/2015','12/08/2015',
'24/08/2015','23/10/2015','04/11/2015','16/11/2015','28/11/2015','22/12/2015']
values = [4.2e-05,-0.003414,0.016886,0.010597,-0.015756,-0.011592,
-0.018709,-0.031948,-0.000361,0.033206,0.122711,0.092198,0.067306,0.000668,
-0.057302,-0.052964,-0.076545]
df = pd.DataFrame([date,values]).T # If not transposed it's not well organized
df.columns = ['Date','Values']
df.Date = df.Date.map(lambda x: pd.to_datetime(x,dayfirst=True))
df.reset_index()
df = df.set_index(['Date'])
df.resample('M').mean()
时间数据是DateTime格式,而时间序列值是浮动的。
即使如此,这也是不断出现的错误。
df.resample('M').mean()
File "C:\WPy64-3760\python-3.7.6.amd64\lib\site-packages\pandas\core\groupby\generic.py", line 188, in _cython_agg_blocks
raise DataError("No numeric types to aggregate")
DataError: No numeric types to aggregate
重要的是,并不是所有月份的时间序列都包含一个以上的值。更有甚者,有些月份可能没有数据。我想这不会引起麻烦。顺便说一下,我使用的是Pandas 0.25.3版本。
我不知道发生了什么。
pd.DataFrame([date,values]).T
,列都被认作对象。栏中的 Values
类型永远不会被设置为 float
.import pandas as pd
# data
date = ['09/03/2015','02/04/2015','26/04/2015','08/05/2015','20/05/2015',
'01/06/2015','13/06/2015','25/06/2015','07/07/2015','31/07/2015','12/08/2015',
'24/08/2015','23/10/2015','04/11/2015','16/11/2015','28/11/2015','22/12/2015']
values = [4.2e-05,-0.003414,0.016886,0.010597,-0.015756,-0.011592,
-0.018709,-0.031948,-0.000361,0.033206,0.122711,0.092198,0.067306,0.000668,
-0.057302,-0.052964,-0.076545]
# create dataframe
# Values is properly recognized as a float
df = pd.DataFrame({'Date': date, 'Values': values})
# Convert Date to a datetime and set as the index
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df = df.set_index(['Date'])
# resample
df.resample('M').mean()
import pandas as pd
import numpy as np
from datetime import datetime
# data
np.random.seed(365)
data = {'a': [np.random.randint(10) for _ in range(40)],
'b': [np.random.randint(10) for _ in range(40)],
'c': [np.random.randint(10) for _ in range(40)],
'd': [np.random.randint(10) for _ in range(40)],
'e': [np.random.randint(10) for _ in range(40)],
'date': pd.bdate_range(datetime.today(), freq='w', periods=40).tolist()}
# dataframe
df = pd.DataFrame(data)
# set index
df.set_index('date', inplace=True)
print(df.head())
a b c d e
date
2020-05-17 2 1 6 8 6
2020-05-24 4 4 5 9 1
2020-05-31 1 0 7 9 5
2020-06-07 5 9 7 7 7
2020-06-14 2 6 9 5 6
# resample
df.resample('M').mean()
a b c d e
date
2020-05-31 2.333333 1.666667 6.00 8.666667 4.00
2020-06-30 4.500000 6.500000 6.25 3.500000 6.00
2020-07-31 3.750000 4.750000 2.25 3.500000 5.75
2020-08-31 4.800000 6.000000 2.00 3.800000 4.00
2020-09-30 4.250000 4.500000 3.00 4.750000 6.75
2020-10-31 5.500000 3.500000 5.00 6.750000 7.25
2020-11-30 5.400000 6.600000 5.60 5.200000 4.20
2020-12-31 6.250000 6.750000 5.75 4.500000 3.25
2021-01-31 7.200000 3.200000 3.20 5.200000 4.20
2021-02-28 4.500000 3.500000 2.50 5.500000 3.50