从 Spark 数据帧中过滤并记录空值

问题描述 投票:0回答:1

我有这个数据框:

+------+-------------------+-----------+
|brand |original_timestamp |weight     |
+------+-------------------+-----------+
|BR1   |1632899456         |4.0        |
|BR2   |1632899456         |null       |
|BR3   |1632899456         |2.0        |
|BR4   |1632899155         |2.0        |
|BR5   |1632899155         |null       |

我想删除(过滤掉)空值并打印一条消息,例如:

“品牌 BR2 的权重为空,将其从数据中删除”

“品牌 BR5 的权重为空,将其从数据中删除”

我正在使用 Spark 版本 3.2.2 和 SQLContext,以及 scala 语言。

dataframe scala apache-spark apache-spark-sql nullpointerexception
1个回答
0
投票

您必须将

filter
函数与
isNull
isNotNull
一起使用才能获得所需的输出

使用

null

 过滤 
isNull

df.filter($"weight".isNull).show(false)
+-----+------------------+------+
|brand|original_timestamp|weight|
+-----+------------------+------+
|BR2  |1632899456        |NULL  |
|BR5  |1632899155        |NULL  |
+-----+------------------+------+
打印消息

df.filter($"weight".isNull) .selectExpr(""" concat( '\"Weight for brand ', brand, ' is null, dropping it from the data\"' ) message """) .as[String] .collect .foreach(println)
"Weight for brand BR2 is null, dropping it from the data"
"Weight for brand BR5 is null, dropping it from the data"
使用 
not null

 过滤 
isNotNull
df.filter($"weight".isNotNull).show(false)

+-----+------------------+------+
|brand|original_timestamp|weight|
+-----+------------------+------+
|BR1  |1632899456        |4.0   |
|BR3  |1632899456        |2.0   |
|BR4  |1632899155        |2.0   |
+-----+------------------+------+
© www.soinside.com 2019 - 2024. All rights reserved.