Python 中与重要性星星的自相关

问题描述 投票:0回答:1

我想将重要性星星(p 值)添加到 df 中的自相关(按列)。

如何将重要性星星合并到每个自相关系数旁边?

from statsmodels.tsa.stattools import acf

def autocorr_with_asterisks(df):
    """
    Calculate autocorrelation coefficients and add significance asterisks.
    
    Parameters:
        df (DataFrame): Input DataFrame with time series data.
    
    Returns:
        DataFrame: DataFrame containing autocorrelation coefficients with significance asterisks.
    """
    autocorr_df = pd.DataFrame
    asterisks = []

    for col in df.columns:
        acf_vals = acf(df[col],nlags=9, qstat=True)
        autocorr_df[col] = acf_vals[0]
        

    col_asterisks = []
    for p_val in acf_vals[2]:
        if p_val < 0.01:
            col_asterisks.append('***')
        elif p_val < 0.05:
            col_asterisks.append('**')
        elif p_val < 0.1:
            col_asterisks.append('*')
        else:
            col_asterisks.append('')
    asterisks.append(col_asterisks)

    autocorr_df_with_asterisks = autocorr_df.astype(str) + np.array(asterisks).T
    return autocorr_df_with_asterisks
  

样本数据:

import pandas as pd
import numpy as np 

# Create an empty DataFrame with the specified columns and rows
df = pd.DataFrame(np.random.randn(100, 5), columns=['Return_1', 'Return_2', 'Return_3', 'Return_4', 'Return_5'])
pandas p-value significance statmodels
1个回答
0
投票

您的解决方案并不遥远:以下是解决此问题的方法。我创建了一个新的数据框来展示一个示例。另请注意,我在函数的定义中添加了滞后数,以便您可以根据需要更改它。

from statsmodels.tsa.stattools import acf
import pandas as pd
import numpy as np


t = np.arange(250)
series1 = np.sin(t/10) + np.random.normal(0, 1, size=len(t))
series2 = np.cos(t/20) + np.random.normal(0, 1, size=len(t))
series3 = np.sin(t/30) + np.random.normal(0, 1, size=len(t))
series4 = np.random.normal(0.5, 0.5, size=len(t))  
series5 = np.cumsum(np.random.normal(0.5, 0.5, size=len(t)))  
df = pd.DataFrame({
    'Series1': series1,
    'Series2': series2,
    'Series3': series3,
    'Series4': series4,
    'Series5': series5
})



def autocorr_with_asterisks(df, nlags=9):
    """
    Calculate autocorrelation coefficients and add significance asterisks based on Ljung-Box Q statistic.
    
    Parameters:
        df (DataFrame): Input DataFrame with time series data.
        nlags (int): Number of lags for autocorrelation calculation.
    
    Returns:
        DataFrame: DataFrame containing autocorrelation coefficients with significance asterisks.
    """
    autocorr_df = pd.DataFrame(index=range(nlags+1))  

    for col in df.columns:
        acf_vals = acf(df[col], nlags=nlags, fft=True)
        qstats, p_vals = q_stat(acf_vals[1:], nobs=len(df))
        p_vals = np.append([np.nan], p_vals)  

        asterisks = ['***' if p < 0.01 else '**' if p < 0.05 else '*' if p < 0.1 else '' for p in p_vals]
        autocorr_df[col] = [f"{val:.2f}{ast}" for val, ast in zip(acf_vals, asterisks)]

    return autocorr_df


autocorr_df_with_significance = autocorr_with_significance(df)
autocorr_df_with_significance

导致

  Series1 Series2 Series3 Series4 Series5
0   1.00*   1.00*   1.00*   1.00*   1.00*
1   0.33*   0.22*   0.36*    0.08   0.99*
2   0.40*   0.26*   0.41*   -0.02   0.98*
3   0.35*   0.22*   0.38*   -0.08   0.96*
4   0.29*   0.28*   0.36*    0.01   0.95*
5   0.21*   0.27*   0.33*    0.05   0.94*
6   0.26*   0.24*   0.28*    0.08   0.93*
7   0.30*   0.23*   0.35*   -0.03   0.91*
8   0.22*   0.26*   0.32*   -0.04   0.90*
9   0.26*   0.18*   0.30*   -0.00   0.89*

在这种情况下,只有一个星号,但如果您有更好的实际数据,它一定会产生您想要的结果。

© www.soinside.com 2019 - 2024. All rights reserved.