根据条件过滤数据框

问题描述 投票:0回答:1

嗨,我正在尝试根据条件条件过滤数据框,然后在匹配时应用架构,否则将其保持原样。

val schema = ArrayType(StructType(StructField("packQty",FloatType,true):: StructField("gtin",StringType,true) :: Nil))


+--------------+---------+-----------+-----+--------------------------------------+
|orderupcnumber|enrichqty|allocoutqty|allocatedqty|gtins                                        
|
+--------------+---------+-----------+--------------------------------------------+
|5203754   |15.0     |1.0        |5.0         |[{"packQty":120.0,"gtin":"00052000042276"}]|
|5203754   |15.0     |1.0        |2.0         |[{"packQty":120.0,"gtin":"00052000042276"}|
|5243700   |25.0     |1.0        |2.0         |na                                                                      
|
+--------------+---------+-----------+------------+-------------------------------+

如果gtins列不是“ na”,则我要根据架构添加一列,如果我将其添加0,但会抛出错误提示

 df.withColumn("jsonData",when($"gtins"=!="na",from_json($"gtins",schema)).otherwise(0))


 Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve 'CASE 
 WHEN contains(`gtins`, 'na') THEN 0 ELSE jsontostructs(`gtins`) END' due to data type 
 mismatch: THEN and ELSE expressions should all be same type or coercible to a common type;;


 df.select($"orderupcnumber",$"enrichqty",$"allocoutqty",$"allocatedqty",explode($"jsonData").as("jsonData"))


 +--------------+---------+-----------+-----+--------------+
 |orderupcnumber|enrichqty|allocoutqty|allocatedqty|gtins|JsonData
 +--------------+---------+-----------+--------------------+
 |5203754   |15.0|1.0|5.0|[{"packQty":120.0,"gtin":"00052000042276”}]|[120.0, 00052000042276]
 |5203754   |15.0|1.0 |2.0|[{"packQty":120.0,"gtin":"00052000042276”}|[120.0,00052000042276]
 |5243700   |25.0 |1.0|2.0  |na  |null
 +--------------+---------+-----------+------------+----+


 df.select($"orderupcnumber",$"enrichqty",$"allocoutqty",$"allocatedqty",$"jsonData.packQty".as("packQty"),$"jsonData.gtin".as("gtin")

此选择仅选择jsonData不为null的数据

我如何也可以将其包含null。

scala apache-spark
1个回答
0
投票

condition匹配和不匹配时,由于您的数据不同,因此发生这种情况。您需要具有相同的数据类型。

输入

+------------------------------------------+
|gtins                                     |
+------------------------------------------+
|[{"packQty":120.0,"gtin":"0005200004227"}]|
|[{"packQty":120.0}]                       |
+------------------------------------------+

code

data.withColumn("jsonData", when(from_json($"gtins",schema)(0)("gtin").isNotNull,
                                 from_json($"gtins",schema)(0)("gtin")
                            ).
                                otherwise("0")).show(false)

Outout


+------------------------------------------+-------------+
|gtins                                     |jsonData     |
+------------------------------------------+-------------+
|[{"packQty":120.0,"gtin":"0005200004227"}]|0005200004227|
|[{"packQty":120.0}]                       |0            |
+------------------------------------------+-------------+
© www.soinside.com 2019 - 2024. All rights reserved.