X_train[var] = np.where(X_train[var].isin(frequent_ls), X_train[var], 'Rare')
如何用pyspark sql函数替换numpy?
Yoi定义udf函数
从spark.sql导入功能作为F
从pyspark.sql.types导入StringType()
Def dictonnary(x):
If x in frequent_ls :return x Else :return "rare"
替换= F.udf(lambda x:dictionnary(x),StrungType())
Xtrain = xtrain.withColumn(“ var2”,replace(F.col(“ var”)))