如何从Pyspark的DataFrame中获取数字列并计算z得分

问题描述 投票:0回答:1
sparkSession = SparkSession.builder.appName("example").getOrCreate()
df = sparkSession.read.json('hdfs://localhost/abc/zscore/')

我能够从hdfs读取数据,我想只为数字列计算zscore

pyspark hdfs pyspark-sql
1个回答
1
投票

您可以将df转换为Pandas并计算z得分

sparkSession = SparkSession.builder.appName("example").getOrCreate()
df = sparkSession.read.json('hdfs://localhost/SmartRegression/zscore/').toPandas()
num_cols = df._get_numeric_data().columns
results = df[num_cols].apply(zscore)
print results
© www.soinside.com 2019 - 2024. All rights reserved.