Hi All I have 2 dataframes in I am comparing values of both the dataframe and based on value assigning value to one new dataframe.All the scenarios are working fine expect null fields comparision i.e. if in both the dataframe values are null then it should show as "varified" but its giving me as "not varified" I am sharing my dataframes data and code which I'm using and result of final dataframe below.
scala> df1.show()
+---+-----+---+--------+------+-------+
| id| name|age|lastname| city|country|
+---+-----+---+--------+------+-------+
| 1|rohan| 26| sharma|mumbai| india|
| 2|rohan| 26| sharma| null| india|
| 3|rohan| 26| null|mumbai| india|
| 4|rohan| 26| sharma|mumbai| india|
+---+-----+---+--------+------+-------+
scala> df2.show()
+----+------+-----+----------+------+---------+
|o_id|o_name|o_age|o_lastname|o_city|o_country|
+----+------+-----+----------+------+---------+
| 1| rohan| 26| sharma|mumbai| india|
| 2| rohan| 26| sharma| null| india|
| 3| rohan| 26| sharma|mumbai| india|
| 4| rohan| 26| null|mumbai| india|
+----+------+-----+----------+------+---------+
val df3 = df1.join(df2, df1("id") === df2("o_id"))
.withColumn("result", when(df1("name") === df2("o_name") &&
df1("age") === df2("o_age") &&
df1("lastname") === df2("o_lastname") &&
df1("city") === df2("o_city") &&
df1("country") === df2("o_country"), "Varified")
.otherwise("Not Varified")).show()
+---+-----+---+--------+------+-------+----+------+-----+----------+------+---------+------------+
| id| name|age|lastname| city|country|o_id|o_name|o_age|o_lastname|o_city|o_country| result|
+---+-----+---+--------+------+-------+----+------+-----+----------+------+---------+------------+
| 1|rohan| 26| sharma|mumbai| india| 1| rohan| 26| sharma|mumbai| india| Varified|
| 2|rohan| 26| sharma| null| india| 2| rohan| 26| sharma| null| india|Not Varified|
| 3|rohan| 26| null|mumbai| india| 3| rohan| 26| sharma|mumbai| india|Not Varified|
| 4|rohan| 26| sharma|mumbai| india| 4| rohan| 26| null|mumbai| india|Not Varified|
+---+-----+---+--------+------+-------+----+------+-----+----------+------+---------+------------+
I want that for id '2' also it should show as 'Varified'.but the city is null in both the column then its showing as 'Not Varified'.Can someone please guide me how should I Modify my df3 query so it can check null also and for id '2' also can show as 'Varified' in result column.
使用 <=>
而不是 ===
val df3 = df1.join(df2, df1("id") === df2("o_id"))
.withColumn("result", when(df1("name") <=> df2("o_name") &&
df1("age") <=> df2("o_age") &&
df1("lastname") <=> df2("o_lastname") &&
df1("city") <=> df2("o_city") &&
df1("country") <=> df2("o_country"), "Varified")
.otherwise("Not Varified")).show()
spark.sql("SELECT NULL AS city1, NULL AS city2").select($"city1" <=> $"city2").show
结果
+-----------------+
|(city1 <=> city2)|
+-----------------+
| true |
+-----------------+
在你的 when+otherwise
声明添加 <=>
(或) ||
运营商和检查 .isNull 对于 last_name and city
列。
null=null
返回 null
我们无法匹配的背后原因。
spark.sql("select null=null").show()
//+-------------+
//|(NULL = NULL)|
//+-------------+
//| null|
//+-------------+
Using <=>,isnull():
spark.sql("select null<=>null, isnull(null) = isnull(null)").show()
//+---------------+---------------------------------+
//|(NULL <=> NULL)|((NULL IS NULL) = (NULL IS NULL))|
//+---------------+---------------------------------+
//| true| true|
//+---------------+---------------------------------+
Example:
df1.join(df2, df1("id") === df2("o_id")).
withColumn("result", when( (df1("name") === df2("o_name")) && (df1("age") === df2("o_age") ) &&
(df1("lastname") === df2("o_lastname")|| (df1("lastname").isNull === df2("o_lastname").isNull)) &&
(df1("city") === df2("o_city")|| (df1("city").isNull === df2("o_city").isNull)) &&
(df1("country") === df2("o_country")), "Varified").otherwise("Not Varified")).
show()
//or using <>
df1.join(df2, df1("id") === df2("o_id")).withColumn("result", when( (df1("name") === df2("o_name")) && (df1("age") === df2("o_age")) && (df1("lastname") <=> df2("o_lastname")) && (df1("city") <=> df2("o_city")) && (df1("country") === df2("o_country")), "Varified").otherwise("Not Varified")).show()
//+---+-----+---+--------+------+-------+----+------+-----+----------+------+---------+------------+
//| id| name|age|lastname| city|country|o_id|o_name|o_age|o_lastname|o_city|o_country| result|
//+---+-----+---+--------+------+-------+----+------+-----+----------+------+---------+------------+
//| 1|rohan| 26| sharma|mumbai| india| 1| rohan| 26| sharma|mumbai| india| Varified|
//| 2|rohan| 26| sharma| null| india| 2| rohan| 26| sharma| null| india| Varified|
//| 3|rohan| 26| null|mumbai| india| 3| rohan| 26| sharma|mumbai| india|Not Varified|
//| 4|rohan| 26| sharma|mumbai| india| 4| rohan| 26| null|mumbai| india|Not Varified|
//+---+-----+---+--------+------+-------+----+------+-----+----------+------+---------+------------+