我实际上有两个原始数据集(每个数据集都以特定方式与每个数据集相关,但知道具体如何并不重要),但是这两个数据集在我删除的“值”列中包含一些异常值这导致创建了 2 个过滤的新数据集。我的主要目标是实际估算已删除的值,但另外我希望估算值遵守某个约束,即“两个偏离的原始值之和”(Y1 + Y2)和“估算值之和”之间的相对差异两个偏离的值”(X1+X2)必须低于某个阈值(百分比 epsilon)。我用 KNN 方法初始化了这些值。
这是我为我的代码写的
# fonction huber loss
def huber_loss_relative(x, y, eps):
filtered_list = y[y == 0] # Filter out zero values
mean = filtered_list.mean() # Calculate the mean of non-zero values
diff = np.abs(y - x) / y # Calculate the relative difference
bool = diff <= eps
loss = 0.5 * (diff ** 2) * bool + eps * (diff - 0.5 * eps) * (1 - bool)
diff2=np.abs(filtered_list - x[y==0]) / x[y==0]
bool2 = diff2 <= eps
loss[y == 0] = 0.5 * (diff2 ** 2) * bool2 + eps * (diff2 - 0.5 * eps) * (1 - bool2)
return np.mean(loss)
# fonction objective
def objective(x1, x2, y1, y2, eps, lam):
mse =np.mean(np.abs(y2 +y1 - x2 - x1)**2)
constraint_loss = huber_loss_relative(x1 + x2, y1 + y2, eps)
return mse +lam*constraint_loss
# fonction pour imputation avec contrainte pour les deux départs
def constrained_imputation(data1_filtered=pd.DataFrame, data2_filtered=pd.DataFrame,df1_original=pd.DataFrame,df2_original=pd.DataFrame, eps=0.1, lam=0.7, max_iter=10000, tol=1e-9,learning_rate=0.01):
# on repere les indices des valeurs manquantes
value_missing=data1_filtered['value'].isnull()
indexes_missing = np.where(value_missing)[0]
#on récupère les valeurs réelles sur les périodes de reports de charge
y1=df1_original['value'][indexes_missing].values
y2=df2_original['value'][indexes_missing].values
# knn imputation sur les deux départs pour initialiser
imputer1 = KNNImputer(n_neighbors=3)
X=data1_filtered.drop(['horodate','gdo','Unnamed: 0'],axis=1)
x01 = imputer1.fit_transform(X)
x01=x01[:,0]
x01=x01[indexes_missing]
imputer2 = KNNImputer(n_neighbors=3)
X=data2_filtered.drop(['horodate','gdo','Unnamed: 0'],axis=1)
x02 = imputer2.fit_transform(X)
x02=x02[:,0]
x02=x02[indexes_missing]
x_imputed = np.concatenate([x01, x02])
# Définit la fonction d'optimization :
fun = lambda x: objective(x[:len(indexes_missing)], x[len(indexes_missing):], y1, y2, eps, lam)
# Le vecteur x0 pour lequel il faut trouver la solution :
x0 = np.concatenate([x01, x02])
# Minimization de la fonction objective
result = minimize(fun, x0, method='L-BFGS-B', options={'maxiter': max_iter, 'ftol': tol})
# on extrait les valeurs imputées :
x1_imputed = x_imputed[:len(indexes_missing)]
x2_imputed = x_imputed[len(indexes_missing):]
#creer les tables finales :
df_imputed_1,df_imputed_2=data1_filtered.copy(),data2_filtered.copy()
df_imputed_1['value'][indexes_missing]=x1_imputed
df_imputed_2['value'][indexes_missing]=x2_imputed
return df_imputed_1,df_imputed_2
但我觉得即使我调整函数的参数值,它也不会真正改变验证约束的估算值的数量。我认为问题可能是由目标函数引起的,那么你们对此有何看法?我可以在这个问题中使用哪些目标函数,或者是否有另一种可能的方法来估算特定约束。