约束归因

问题描述 投票:0回答:0

我实际上有两个原始数据集(每个数据集都以特定方式与每个数据集相关,但知道具体如何并不重要),但是这两个数据集在我删除的“值”列中包含一些异常值这导致创建了 2 个过滤的新数据集。我的主要目标是实际估算已删除的值,但另外我希望估算值遵守某个约束,即“两个偏离的原始值之和”(Y1 + Y2)和“估算值之和”之间的相对差异两个偏离的值”(X1+X2)必须低于某个阈值(百分比 epsilon)。我用 KNN 方法初始化了这些值。

这是我为我的代码写的

# fonction huber loss
def huber_loss_relative(x, y, eps):
    filtered_list = y[y == 0]  # Filter out zero values
    mean = filtered_list.mean()  # Calculate the mean of non-zero values

    diff = np.abs(y - x) / y  # Calculate the relative difference
    bool = diff <= eps
    loss = 0.5 * (diff ** 2) * bool + eps * (diff - 0.5 * eps) * (1 - bool)
    diff2=np.abs(filtered_list - x[y==0]) / x[y==0]
    bool2 = diff2 <= eps

    loss[y == 0] =  0.5 * (diff2 ** 2) * bool2 + eps * (diff2 - 0.5 * eps) * (1 - bool2)
    return np.mean(loss)




# fonction objective
def objective(x1, x2, y1, y2, eps, lam):
    mse =np.mean(np.abs(y2 +y1 - x2 - x1)**2)
    
    constraint_loss = huber_loss_relative(x1 + x2, y1 + y2, eps)
    return   mse +lam*constraint_loss




# fonction pour imputation avec contrainte pour les deux départs
def constrained_imputation(data1_filtered=pd.DataFrame, data2_filtered=pd.DataFrame,df1_original=pd.DataFrame,df2_original=pd.DataFrame, eps=0.1, lam=0.7, max_iter=10000, tol=1e-9,learning_rate=0.01):
    # on repere les indices des valeurs manquantes
    value_missing=data1_filtered['value'].isnull()
    indexes_missing = np.where(value_missing)[0]
    #on récupère les valeurs réelles sur les périodes de reports de charge 
    y1=df1_original['value'][indexes_missing].values
    y2=df2_original['value'][indexes_missing].values 
    # knn imputation sur les deux départs pour initialiser
    imputer1 = KNNImputer(n_neighbors=3)
    X=data1_filtered.drop(['horodate','gdo','Unnamed: 0'],axis=1)
    x01 = imputer1.fit_transform(X)
    x01=x01[:,0]
    x01=x01[indexes_missing]

    imputer2 = KNNImputer(n_neighbors=3)
    X=data2_filtered.drop(['horodate','gdo','Unnamed: 0'],axis=1)
    x02 = imputer2.fit_transform(X)
    x02=x02[:,0]
    x02=x02[indexes_missing]
    x_imputed = np.concatenate([x01, x02])
    
    # Définit la fonction d'optimization :
    fun = lambda x: objective(x[:len(indexes_missing)], x[len(indexes_missing):], y1, y2, eps, lam)
    # Le vecteur x0 pour lequel il faut trouver la solution :
    x0 = np.concatenate([x01, x02])
    # Minimization de la fonction objective 
    result = minimize(fun, x0, method='L-BFGS-B', options={'maxiter': max_iter, 'ftol': tol})
    # on extrait les valeurs imputées :
    x1_imputed = x_imputed[:len(indexes_missing)]
    x2_imputed = x_imputed[len(indexes_missing):]
    #creer les tables finales :
    df_imputed_1,df_imputed_2=data1_filtered.copy(),data2_filtered.copy()
    df_imputed_1['value'][indexes_missing]=x1_imputed
    df_imputed_2['value'][indexes_missing]=x2_imputed

    return df_imputed_1,df_imputed_2

但我觉得即使我调整函数的参数值,它也不会真正改变验证约束的估算值的数量。我认为问题可能是由目标函数引起的,那么你们对此有何看法?我可以在这个问题中使用哪些目标函数,或者是否有另一种可能的方法来估算特定约束。

python dataframe knn imputation objective-function
© www.soinside.com 2019 - 2024. All rights reserved.