我如何为以下步骤编写等效的pyspark代码?

问题描述 投票:0回答:1
    X_train[var] = np.where(X_train[var].isin(frequent_ls), X_train[var], 'Rare')

如何用pyspark sql函数替换numpy?

python pyspark pyspark-sql
1个回答
0
投票

Yoi定义udf函数

从spark.sql导入功能作为F

从pyspark.sql.types导入StringType()

Def dictonnary(x):

     If x in  frequent_ls :return x 


     Else :return "rare"

替换= F.udf(lambda x:dictionnary(x),StrungType())

Xtrain = xtrain.withColumn(“ var2”,replace(F.col(“ var”)))

© www.soinside.com 2019 - 2024. All rights reserved.