我有一个用例:
1. "apple+case"
2. "apple+case+10+cover"
3. "apple+case+10++cover"
4. "+apple"
5. "iphone8+"
当前,我正在用空格替换+ =>
def normalizer(value: String): String = {
if (value == null) {
null
} else {
trim(value("\\+", " "))
}
}
val testUDF = udf(normalizer(_: String): String)
df.withColumn("newCol", testUDF($"value"))
但是这将替换所有的“ +”。如何替换字符串之间的“ +”还可以处理用例,例如-“ apple + case + 10 ++ cover” =>“ apple case 10+ cover”
The output should be
1. "apple case"
2. "apple case 10 cover"
3. "apple case 10+ cover"
4. "apple"
5. "iphone8+"
您可以尝试进行两次正则表达式替换:
df.withColumn("newCol", regexp_replace(
regexp_replace(testUDF("value"), "(?<=\d)\+(?!\+)", "+ "),
"(?<!\d)\+", " ")).show