使用Java将特征向量拆分为列

问题描述 投票:0回答:1

我有如下数据框:

+---------------+--------------------+
|IndexedArtistID|     recommendations|
+---------------+--------------------+
|           1580|[[919, 0.00249262...|
|           4900|[[41749, 7.143963...|
|           5300|[[0, 2.0147272E-4...|
|           6620|[[208780, 9.81092...|
+---------------+--------------------+

我想拆分建议列,以便具有如下数据框:

+---------------+--------------------+
|IndexedArtistID|     recommendations|
+---------------+--------------------+
|           1580|919                 |
|           1580|0.00249262          |
|           4900|41749               |
|           4900|7.143963            |
|           5300|0                   |
|           5300|2.0147272E-4        |
|           6620|208780              |
|           6620|9.81092             |
+---------------+--------------------+

所以基本上,我想将特征向量拆分为列,然后将这些列合并为单个列。合并部分在How to split single row into multiple rows in Spark DataFrame using Java中进行了描述。现在,如何使用Java执行拆分部分?对于scala,在这里进行了解释:Spark Scala: How to convert Dataframe[vector] to DataFrame[f1:Double, ..., fn: Double)],但是我无法找到一种方法,可以按照链接中给出的方式在java中进行相同的处理。

apache-spark apache-spark-sql spark-java
1个回答
0
投票

这里是scala代码。您可以在Java中重新使用它。该列上的数组被拆分为多行。

    val df1 = spark.createDataFrame(Seq((1,Seq(1.0,2.2)), (2,Seq(2,3.8)),(3,Seq(4,5.3)))).toDF("IndexedArtistID","recommendations")

    df1.show()

    df1.map{ r =>
      val clubs = r.getAs[Seq[Double]]("recommendations")
      for{
        c : Double <- clubs
      }yield(r.getAs[Integer]("IndexedArtistID"), c)
    }.flatMap(identity(_)).toDF("IndexedArtistID","recommendations").show(false)
Input 
+---------------+---------------+
|IndexedArtistID|recommendations|
+---------------+---------------+
|              1|     [1.0, 2.2]|
|              2|     [2.0, 3.8]|
|              3|     [4.0, 5.3]|
+---------------+---------------+
output
+---------------+---------------+
|IndexedArtistID|recommendations|
+---------------+---------------+
|1              |1.0            |
|1              |2.2            |
|2              |2.0            |
|2              |3.8            |
|3              |4.0            |
|3              |5.3            |
+---------------+---------------+
© www.soinside.com 2019 - 2024. All rights reserved.