下面的代码在我想要 JSON 输出的地方运行良好。嗯,差不多。请参阅下面的代码。
它返回:
+--------------------------------------------------------------------------------------------------------+
|afterImage |
+--------------------------------------------------------------------------------------------------------+
|{\n "PRICE" : "3599",\n "PRODUCT_ID" : "230117",\n "DESCRIPTION" : "Hamsberry vintage tee, cherry"\n}|
|{\n "PRICE" : "4000",\n "PRODUCT_ID" : "230117",\n "DESCRIPTION" : "Hamsberry vintage tee, cherry"\n}|
|{\n "NUM" : "20",\n "PRODUCT_ID" : "230117",\n "DESCRIPTION" : "Hamsberry vintage tee, cherry"\n} |
+--------------------------------------------------------------------------------------------------------+
但我想删除
\n
。我可以编写代码,但我想知道是否有更好的方法可以不使用 JACKSON 或其他 JACKSON 方法来执行此操作。当然regexp_replace
是一个选项。
我查看了 Spark dataframe to jackson json in scala,但这不是我正在考虑的方法。我最喜欢
Diff
和pretty
。
import org.json4s._
import org.json4s.jackson._
import spark.implicits._
import org.apache.spark.sql.functions.{col, lit, when, from_json, map_keys, map_values}
import org.apache.spark.sql.types.{MapType, StringType}
val path = "/FileStore/tables/json_0002C_file.txt"
val df = spark.read.text(path) // String
val df2 = df.withColumn("value", from_json(col("value"), MapType(StringType, StringType))) // Always 3 elements. Plus add in one later potentially.
val df3 = df2.select(map_values(col("value")))
val df4 = df3.select($"map_values(value)"(0).as("meta"), $"map_values(value)"(1).as("data"), $"map_values(value)"(2).as("key"))
def getAfterImage = (data: String, key: String) => {
val jsonData = parse(data)
val jsonKey = parse(key)
val Diff(changed, added, deleted) = jsonKey diff jsonData
val afterImage = changed merge deleted
pretty(render(afterImage)) // String, need to add the dummy missing columns, for those that have been dropped but...
}
val afterImage = spark.udf.register("callUDFAI", getAfterImage)
val df5 = df4.withColumn("afterImage", afterImage(col("data"), col("key"))).select("afterImage")//.show(false)
df5.show(false)
df5.printSchema()
ANSWER 清楚并正式回答,保留原样以帮助他人。