我想将重要性星星(p 值)添加到 df 中的自相关(按列)。
如何将重要性星星合并到每个自相关系数旁边?
from statsmodels.tsa.stattools import acf
def autocorr_with_asterisks(df):
"""
Calculate autocorrelation coefficients and add significance asterisks.
Parameters:
df (DataFrame): Input DataFrame with time series data.
Returns:
DataFrame: DataFrame containing autocorrelation coefficients with significance asterisks.
"""
autocorr_df = pd.DataFrame
asterisks = []
for col in df.columns:
acf_vals = acf(df[col],nlags=9, qstat=True)
autocorr_df[col] = acf_vals[0]
col_asterisks = []
for p_val in acf_vals[2]:
if p_val < 0.01:
col_asterisks.append('***')
elif p_val < 0.05:
col_asterisks.append('**')
elif p_val < 0.1:
col_asterisks.append('*')
else:
col_asterisks.append('')
asterisks.append(col_asterisks)
autocorr_df_with_asterisks = autocorr_df.astype(str) + np.array(asterisks).T
return autocorr_df_with_asterisks
样本数据:
import pandas as pd
import numpy as np
# Create an empty DataFrame with the specified columns and rows
df = pd.DataFrame(np.random.randn(100, 5), columns=['Return_1', 'Return_2', 'Return_3', 'Return_4', 'Return_5'])
您的解决方案并不遥远:以下是解决此问题的方法。我创建了一个新的数据框来展示一个示例。另请注意,我在函数的定义中添加了滞后数,以便您可以根据需要更改它。
from statsmodels.tsa.stattools import acf
import pandas as pd
import numpy as np
t = np.arange(250)
series1 = np.sin(t/10) + np.random.normal(0, 1, size=len(t))
series2 = np.cos(t/20) + np.random.normal(0, 1, size=len(t))
series3 = np.sin(t/30) + np.random.normal(0, 1, size=len(t))
series4 = np.random.normal(0.5, 0.5, size=len(t))
series5 = np.cumsum(np.random.normal(0.5, 0.5, size=len(t)))
df = pd.DataFrame({
'Series1': series1,
'Series2': series2,
'Series3': series3,
'Series4': series4,
'Series5': series5
})
def autocorr_with_asterisks(df, nlags=9):
"""
Calculate autocorrelation coefficients and add significance asterisks based on Ljung-Box Q statistic.
Parameters:
df (DataFrame): Input DataFrame with time series data.
nlags (int): Number of lags for autocorrelation calculation.
Returns:
DataFrame: DataFrame containing autocorrelation coefficients with significance asterisks.
"""
autocorr_df = pd.DataFrame(index=range(nlags+1))
for col in df.columns:
acf_vals = acf(df[col], nlags=nlags, fft=True)
qstats, p_vals = q_stat(acf_vals[1:], nobs=len(df))
p_vals = np.append([np.nan], p_vals)
asterisks = ['***' if p < 0.01 else '**' if p < 0.05 else '*' if p < 0.1 else '' for p in p_vals]
autocorr_df[col] = [f"{val:.2f}{ast}" for val, ast in zip(acf_vals, asterisks)]
return autocorr_df
autocorr_df_with_significance = autocorr_with_significance(df)
autocorr_df_with_significance
导致
Series1 Series2 Series3 Series4 Series5
0 1.00* 1.00* 1.00* 1.00* 1.00*
1 0.33* 0.22* 0.36* 0.08 0.99*
2 0.40* 0.26* 0.41* -0.02 0.98*
3 0.35* 0.22* 0.38* -0.08 0.96*
4 0.29* 0.28* 0.36* 0.01 0.95*
5 0.21* 0.27* 0.33* 0.05 0.94*
6 0.26* 0.24* 0.28* 0.08 0.93*
7 0.30* 0.23* 0.35* -0.03 0.91*
8 0.22* 0.26* 0.32* -0.04 0.90*
9 0.26* 0.18* 0.30* -0.00 0.89*
在这种情况下,只有一个星号,但如果您有更好的实际数据,它一定会产生您想要的结果。