当条件发生变化时,带列的火花数据框

问题描述 投票:0回答:1

我的要求如下

连接两个数据框,如下图。

     var c = a.join(b,keys,"fullouter")

c.printSchema()如下。

     |-- add: string (nullable = true)
     |-- sub: string (nullable = true)
     |-- delete: string (nullable = true)
     |-- mul: long (nullable = true)
     |-- ADD: string (nullable = true)
     |-- SUB: string (nullable = true)
     |-- DELETE: string (nullable = true)
     |-- MUL: long (nullable = true)
      It's good until here.

现在我在做一个withcolumn的条件,如下所示

     val d = c.withColumn("column", when(c("a.add") === c("b.ADD"), 
   "Neardata"))

错误如下。

    Exception in thread "main" org.apache.spark.sql.AnalysisException: 
    Cannot resolve column name "a.add"

我也试了以下的方法

     val d = c.withColumn("column", when(col("a.add") === col("b.ADD"), "Neardata"))

    Again error.

   Please suggest.
scala apache-spark
1个回答
1
投票

你必须用datframe.as("a")和dataframe1.as("b")定义别名。

例子 :


  import spark.sqlContext.implicits._
  val data = List(("James","","Smith","36636","M",60000),
    ("Michael","Rose","","40288","M",70000),
    ("Robert","","Williams","42114","",400000),
    ("Maria","Anne","Jones","39192","F",500000),
    ("Jen","Mary","Brown","","F",0))

  val cols = Seq("first_name","middle_name","last_name","dob","gender","salary")
  val df = spark.createDataFrame(data).toDF(cols:_*).as("a")
  val df2 = df.withColumn("a.new_gender", when(col("a.gender") === "M","Male")
    .when(col("a.gender") === "F","Female")
    .otherwise("Unknown")).show

輸出 :

+----------+-----------+---------+-----+------+------+------------+
|first_name|middle_name|last_name|  dob|gender|salary|a.new_gender|
+----------+-----------+---------+-----+------+------+------------+
|     James|           |    Smith|36636|     M| 60000|        Male|
|   Michael|       Rose|         |40288|     M| 70000|        Male|
|    Robert|           | Williams|42114|      |400000|     Unknown|
|     Maria|       Anne|    Jones|39192|     F|500000|      Female|
|       Jen|       Mary|    Brown|     |     F|     0|      Female|
+----------+-----------+---------+-----+------+------+------------+

我想,如果没有别名,你试图像这样访问......这可能是原因。

  val df2 = df.withColumn("df.new_gender", when(col("df.gender") === "M","Male")
    .when(col("df.gender") === "F","Female")
    .otherwise("Unknown")).show

© www.soinside.com 2019 - 2024. All rights reserved.