我有这个数据框:
+------+-------------------+-----------+
|brand |original_timestamp |weight |
+------+-------------------+-----------+
|BR1 |1632899456 |4.0 |
|BR2 |1632899456 |null |
|BR3 |1632899456 |2.0 |
|BR4 |1632899155 |2.0 |
|BR5 |1632899155 |null |
我想删除(过滤掉)空值并打印一条消息,例如:
“品牌 BR2 的权重为空,将其从数据中删除”
“品牌 BR5 的权重为空,将其从数据中删除”
我正在使用 Spark 版本 3.2.2 和 SQLContext,以及 scala 语言。
您必须将
filter
函数与 isNull
和 isNotNull
一起使用才能获得所需的输出
使用 null
isNull
值
df.filter($"weight".isNull).show(false)
+-----+------------------+------+
|brand|original_timestamp|weight|
+-----+------------------+------+
|BR2 |1632899456 |NULL |
|BR5 |1632899155 |NULL |
+-----+------------------+------+
打印消息
df.filter($"weight".isNull)
.selectExpr("""
concat(
'\"Weight for brand ',
brand,
' is null, dropping it from the data\"'
) message
""")
.as[String]
.collect
.foreach(println)
"Weight for brand BR2 is null, dropping it from the data"
"Weight for brand BR5 is null, dropping it from the data"
使用
not null
过滤
isNotNull
值
df.filter($"weight".isNotNull).show(false)
+-----+------------------+------+
|brand|original_timestamp|weight|
+-----+------------------+------+
|BR1 |1632899456 |4.0 |
|BR3 |1632899456 |2.0 |
|BR4 |1632899155 |2.0 |
+-----+------------------+------+