如何根据数据价格回报最小化一种分布的kstest提供的python中的-p值?

问题描述 投票:0回答:1

我正在尝试从熊猫数据读取器库中下载股票价格,并根据我提供的报价来计算(每日,每周,每月等)回报。

下载数据后,我在此数据的分布上执行kstest,并根据提供的p值评估它是否类似于双正态分布(两个正态分布之和)。

由于我仅对此分布执行一个kstest,所以我想利用Python中的“最小化”库来最大化p值(最小化-p值),从而改变这两个分布的均值,标准差和权重。

import warnings
import numpy as np
import pandas as pd
import scipy.stats as st
from scipy.optimize import minimize
import statsmodels as sm
import matplotlib
import matplotlib.pyplot as plt
from pandas_datareader import data
import time
import xlwt
import matplotlib.ticker as mtick
from sklearn import datasets

def Puxa_Preco(ticker,start_date,end_date,lag):    
    dados= data.get_data_yahoo(ticker, start_date, end_date )

    from sklearn import datasets
    data_set =  np.log(dados['Close'])-np.log(dados['Close'] .shift(lag))

    data_set = data_set.fillna(method='ffill') 
    data_set = data_set.dropna() 

    y = pd.DataFrame()
    y=data_set

    x = np.arange(len(y))
    size = len(y)
    print(y)
    return y

def mixnormal_cdf(distribuicao, weight1, mean1, stdv1,weight2, mean2, stdv2):
    """
    CDF of a mixture of two normal distributions.
    """
    return (weight1*st.norm.cdf(distribuicao, mean1, stdv1) +
            weight2*st.norm.cdf(distribuicao, mean2, stdv2))

def Objetivo(X,distribuicao):
    peso_dist_1 = X[0]
    mi1 = X[1]
    sigma1 = X[2]
    peso_dist_2 = 1-X[0]
    mi2 = X[3]
    sigma2 = X[4]

    stat2, pvalue = st.kstest(distribuicao, cdf=mixnormal_cdf,
                                args=(peso_dist_1, mi1, sigma1,peso_dist_2, mi2, sigma2))
    ''' Kolmogorov-Smirnov Test, to test whether or not the data is from a given distribution. The 
        returned p-value indicates the probability that the data is from the given distribution, 
        i.e. a low p-value means the data are likely not from the tested distribution.
        Note that, for this test, it is necessary to specify shape, location, and scale parameters,
        to obtain meaningful results (c,loc,scale). 

        stat2:     the test statistic, e.g. the max distance between the
        cumulated distributions '''

    return -pvalue

ticker = 'PETR4.SA'
start_date  = '2010-01-02'      #yyyy-mm-dd
end_date    = '2015-01-02'

for lag in range(1,503):

    distribuicao = Puxa_Preco(ticker,start_date,end_date,lag)
    n = len(distribuicao)

    ChuteInicial=[0.3,0.0010,0.0010,-0.0030,0.0830]                                      #peso_dist_1, mi1, sigma1, mi2, sigma2
    test = [0.2,0.0020,0.0110,0.8,-0.0020,0.0230]
    Limites = ([0,1],[-50,+50],[0,+50],[0,1],[-50,+50],[0,+50])                              #peso_dist_1, mi1, sigma1, peso_dist_2,mi2, sigma2
    print("------------------------------------------------------------------------------------------------")
    print("Validation Test:")

    print(-Objetivo(test,distribuicao))                                             #the value should be around -0.90 to verify that the objective function it is ok

    solution = minimize(fun=Objetivo,x0=ChuteInicial,args=distribuicao,method='SLSQP',bounds = Limites)             #minimize - p-valor
    print("------------------------------------------------------------------------------------------------")
    print("solution:")
    print(solution)

查找以下解决方案:

         fun: -8.098252265651002e-53
         jac: array([-2.13080032e-35,  0.00000000e+00,  0.00000000e+00, -1.93307671e-34, 7.91878934e-35])
     message: 'Optimization terminated successfully.'
        nfev: 8
         nit: 1
        njev: 1
      status: 6
     success: True
           x: array([ 0.3  ,  0.001,  0.001, -0.003,  0.083])

但是我知道正确答案应该类似于(test):[0.2,0.0020,0.0110,0.8,-0.0020,0.0230]产生0.90的p值

在我看来,它仅运行了几次模拟,并且由于它没有改变p值,因此它停止了。

有没有一种方法可以确保“最小化”仅在找到大于0.9的p值后才停止?有人可以帮我吗?

[我尝试使用最小化考虑的Nelder Mead方法,看起来更准确,但甚至没有接近应该是答案的0.9 p值,我不知道Nelder Mead方法是否考虑了我提供的限制。] >

#solution = minimize(fun=Objetivo,x0=(ChuteInicial),args=distribuicao,method='Nelder-Mead',bounds = Limites,options={'int':1000000})            

我正在尝试从熊猫数据读取器库中下载股票价格,并根据我提供的报价来计算(每日,每周,每月等)回报。下载数据后,我...

python scipy minimize p-value kolmogorov-smirnov
1个回答
0
投票

通过能够最小化k-s统计量而不是p值以及定义cdf函数的其他修改,我认为我能够估计参数。这是我的代码和优化的参数估计。我从这篇论文中想到了最小化k-s统计量的想法(https://www.researchgate.net/publication/250298633_Minimum_Kolmogorov-Smirnov_test_statistic_parameter_estimates

© www.soinside.com 2019 - 2024. All rights reserved.