我的要求如下
连接两个数据框,如下图。
var c = a.join(b,keys,"fullouter")
c.printSchema()如下。
|-- add: string (nullable = true)
|-- sub: string (nullable = true)
|-- delete: string (nullable = true)
|-- mul: long (nullable = true)
|-- ADD: string (nullable = true)
|-- SUB: string (nullable = true)
|-- DELETE: string (nullable = true)
|-- MUL: long (nullable = true)
It's good until here.
现在我在做一个withcolumn的条件,如下所示
val d = c.withColumn("column", when(c("a.add") === c("b.ADD"),
"Neardata"))
错误如下。
Exception in thread "main" org.apache.spark.sql.AnalysisException:
Cannot resolve column name "a.add"
我也试了以下的方法
val d = c.withColumn("column", when(col("a.add") === col("b.ADD"), "Neardata"))
Again error.
Please suggest.
你必须用datframe.as("a")和dataframe1.as("b")定义别名。
例子 :
import spark.sqlContext.implicits._
val data = List(("James","","Smith","36636","M",60000),
("Michael","Rose","","40288","M",70000),
("Robert","","Williams","42114","",400000),
("Maria","Anne","Jones","39192","F",500000),
("Jen","Mary","Brown","","F",0))
val cols = Seq("first_name","middle_name","last_name","dob","gender","salary")
val df = spark.createDataFrame(data).toDF(cols:_*).as("a")
val df2 = df.withColumn("a.new_gender", when(col("a.gender") === "M","Male")
.when(col("a.gender") === "F","Female")
.otherwise("Unknown")).show
輸出 :
+----------+-----------+---------+-----+------+------+------------+
|first_name|middle_name|last_name| dob|gender|salary|a.new_gender|
+----------+-----------+---------+-----+------+------+------------+
| James| | Smith|36636| M| 60000| Male|
| Michael| Rose| |40288| M| 70000| Male|
| Robert| | Williams|42114| |400000| Unknown|
| Maria| Anne| Jones|39192| F|500000| Female|
| Jen| Mary| Brown| | F| 0| Female|
+----------+-----------+---------+-----+------+------+------------+
我想,如果没有别名,你试图像这样访问......这可能是原因。
val df2 = df.withColumn("df.new_gender", when(col("df.gender") === "M","Male")
.when(col("df.gender") === "F","Female")
.otherwise("Unknown")).show