Spark DataFrame中向量的访问元素,而未在Pyspark中使用UDF

问题描述 投票:0回答:1

我正试图解决这个问题:[Access element of a vector in a Spark DataFrame (Logistic Regression probability vector)但未在Pyspark中使用UDF

我在Scala中看到很多选择,但Pyspark没有。

python apache-spark apache-spark-sql apache-spark-mllib apache-spark-ml
1个回答
0
投票
您可以使用pyspark.ml.feature.VectorSlicer。它采用一个特征向量,并输出带有原始特征子数组的新特征向量。例如:

from pyspark.ml.feature import VectorSlicer from pyspark.ml.linalg import Vectors df = spark.createDataFrame([ (Vectors.dense([-2.0, 2.3, 0.0, 0.0, 1.0]),), (Vectors.dense([0.0, 0.0, 0.0, 0.0, 0.0]),), (Vectors.dense([0.6, -1.1, -3.0, 4.5, 3.3]),)], ["features"]) vs = VectorSlicer(inputCol="features", outputCol="sliced", indices=[1, 4]) print(vs.transform(df).head().sliced) DenseVector([2.3, 1.0]) # elements in 1 and 4 position of first 'features' vector in Dataframe

© www.soinside.com 2019 - 2024. All rights reserved.