如何从sparklyr中的字符串中删除'\'

Question

我正在使用sparklyr并且有一个火花数据框，其中包含一个包含单词的wordt列，其中一些包含我要删除的特殊字符。我在特殊字符之前使用regepx_replace和\\\\是成功的，就像这样：

words.sdf <- words.sdf %>% 
  mutate(word = regexp_replace(word, '\\\\(', '')) %>% 
  mutate(word = regexp_replace(word, '\\\\)', '')) %>% 
  mutate(word = regexp_replace(word, '\\\\+', '')) %>% 
  mutate(word = regexp_replace(word, '\\\\?', '')) %>%
  mutate(word = regexp_replace(word, '\\\\:', '')) %>%
  mutate(word = regexp_replace(word, '\\\\;', '')) %>%
  mutate(word = regexp_replace(word, '\\\\!', ''))

现在我想删除\。我试过了两个：

words.sdf <- words.sdf %>% 
  mutate(word = regexp_replace(word, '\\\\\', ''))

并且：

words.sdf <- words.sdf %>% 
  mutate(word = regexp_replace(word, '\', ''))

但两者都不会起作用......

Answer 1

您必须更正R-side和Java端转义的代码，所以你需要的是"\\\\\\\\"：

df <- copy_to(sc, tibble(word = "(abc\\zyx: 1)"))

df %>% mutate(regexp_replace(word, "\\\\\\\\", ""))

# Source:   lazy query [?? x 2]
# Database: spark_shell_connection
  word           `regexp_replace(word, "\\\\\\\\\\\\\\\\", "")`
  <chr>          <chr>                                         
1 "(abc\\zyx:1)" (abczyx: 1)

根据您的具体要求，可能更容易一次匹配所有字符。例如，您可以只保留单词字符（\w）和空格（\s）：

df %>% mutate(regexp_replace(word, "[^\\\\w+\\\\s+]", ""))

# Source:   lazy query [?? x 2]
# Database: spark_shell_connection
  word            `regexp_replace(word, "[^\\\\\\\\w+\\\\\\\\s+]", "")`
  <chr>           <chr>                                                
1 "(abc\\zyx: 1)" abczyx 1

或仅限字符

df %>% mutate(regexp_replace(word, "[^\\\\w+]", ""))

# Source:   lazy query [?? x 2]
# Database: spark_shell_connection
  word            `regexp_replace(word, "[^\\\\\\\\w+]", "")`
  <chr>           <chr>                                      
1 "(abc\\zyx: 1)" abczyx1

如何从sparklyr中的字符串中删除'\'

问题描述投票：1回答：1

1个回答

最新问题

如何从sparklyr中的字符串中删除'\'

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1