我如何替换只出现在某些东西之间的分隔符

问题描述 投票:1回答:1

我有一个用例:

1. "apple+case"
2. "apple+case+10+cover"
3. "apple+case+10++cover"
4. "+apple"
5. "iphone8+"

当前,我正在用空格替换+ =>

def normalizer(value: String): String = {
    if (value == null) {
      null
    } else {
      trim(value("\\+", " "))
    }
  }

  val testUDF = udf(normalizer(_: String): String)

  df.withColumn("newCol",  testUDF($"value"))

但是这将替换所有的“ +”。如何替换字符串之间的“ +”还可以处理用例,例如-“ apple + case + 10 ++ cover” =>“ apple case 10+ cover”

The output should be
1. "apple case"
2. "apple case 10 cover"
3. "apple case 10+ cover"
4. "apple"
5. "iphone8+"
regex scala apache-spark regex-lookarounds regexp-replace
1个回答
0
投票

您可以尝试进行两次正则表达式替换:

df.withColumn("newCol", regexp_replace(
    regexp_replace(testUDF("value"), "(?<=\d)\+(?!\+)", "+ "),
    "(?<!\d)\+", " ")).show
© www.soinside.com 2019 - 2024. All rights reserved.