在所包含的Clojure库中是否有一个函数可以将一个字符串分割成另一个字符串?

问题描述 投票:0回答:1

我知道在 clojure.stringsplit 函数,该函数返回字符串中不包括给定模式的部分的序列。

(require '[clojure.string :as str-utils])
(str-utils/split "Yes, hello, this is dog yes hello it is me" #"hello")
;; -> ["Yes, " ", this is dog yes " " it is me"]

然而,我试图找到一个函数,将标记作为元素留在返回的向量中。因此,它将像

(split-around "Yes, hello, this is dog yes hello it is me" #"hello")
;; -> ["Yes, " "hello" ", this is dog yes " "hello" " it is me"]

在任何一个包含的库中有这样的功能吗?外部库中有吗?我一直想自己写,但一直没弄明白。

string split clojure clojurescript
1个回答
6
投票

你也可以使用regex lookaheadlookbehind功能来实现。

user> (clojure.string/split "Yes, hello, this is dog yes hello it is me" #"(?<=hello)|(?=hello)")
;;=> ["Yes, " "hello" ", this is dog yes " "hello" " it is me"]

你可以把它理解为 "在前面或后面的单词是'hello'的地方用零长度的字符串分割"

注意,它还忽略了相邻模式的悬空字符串和前导尾部的悬空字符串。

user> (clojure.string/split "helloYes, hello, this is dog yes hellohello it is mehello" #"(?<=hello)|(?=hello)")
;;=> ["hello"
;;    "Yes, "
;;    "hello"
;;    ", this is dog yes "
;;    "hello"
;;    "hello"
;;    " it is me"
;;    "hello"]

你可以把它封装成这样的函数,例如:

(defn split-around [source word]
  (let [word (java.util.regex.Pattern/quote word)]
    (->> (format "(?<=%s)|(?=%s)" word word)       
         re-pattern
         (clojure.string/split source))))

3
投票
(-> "Yes, hello, this is dog yes hello it is me"
    (str/replace #"hello" "~hello~")
    (str/split #"~"))

0
投票

使用@Shlomi的解决方案的例子。

(ns tst.demo.core
  (:use tupelo.core tupelo.test)
  (:require [clojure.string :as str]))

(dotest
  (let [input-str "Yes, hello, this is dog yes hello it is me"
        segments  (mapv str/trim
                    (str/split input-str #"hello"))
        result    (interpose "hello" segments)]
    (is= segments ["Yes," ", this is dog yes" "it is me"])
    (is= result ["Yes," "hello" ", this is dog yes" "hello" "it is me"])))

更新

也许最好为这个用例写一个自定义循环。 比如说

(ns tst.demo.core
  (:use tupelo.core tupelo.test)
  (:require
    [clojure.string :as str] ))

(defn strseg
  "Will segment a string like '<a><tgt><b><tgt><c>' at each occurrence of `tgt`, producing
   an output vector like [ <a> <tgt> <b> <tgt> <c> ]."
  [tgt source]
  (let [tgt-len  (count tgt)
        segments (loop [result []
                        src    source]
                   (if (empty? src)
                     result
                     (let [i (str/index-of src tgt)]
                       (if (nil? i)
                         (let [result-next (into result [src])
                               src-next    nil]
                           (recur result-next src-next))
                         (let [pre-tgt     (subs src 0 i)
                               result-next (into result [pre-tgt tgt])
                               src-next    (subs src (+ tgt-len i))]
                           (recur result-next src-next))))))
        result   (vec
                   (remove (fn [s] (or (nil? s)
                                     (empty? s)))
                     segments))]
    result))

用单元测试

(dotest
  (is= (strseg "hello" "Yes, hello, this is dog yes hello it is me")
    ["Yes, " "hello" ", this is dog yes " "hello" " it is me"] )
  (is= (strseg "hello" "hello")
    ["hello"])
  (is= (strseg "hello" "") [])
  (is= (strseg "hello" nil) [])
  (is= (strseg "hello" "hellohello") ["hello" "hello" ])
  (is= (strseg "hello" "abchellodefhelloxyz") ["abc" "hello" "def" "hello" "xyz" ])
  )

0
投票

这里是另一种解决方案,它避免了leetwinski的答案中存在的重复模式和双重识别的问题(见我的评论),而且还能尽可能地懒惰地计算部分。

(defn partition-str [s sep]
  (->> s
       (re-seq
         (->> sep
              java.util.regex.Pattern/quote ; remove this to treat sep as a regex
              (format "((?s).*?)(?:(%s)|\\z)")
              re-pattern))
       (mapcat rest)
       (take-while some?)
       (remove empty?))) ; remove this to keep empty parts

然而 当分隔符与空字符串匹配时,这并不符合直觉。

另一种方法是同时使用 re-seqsplit 用相同的模式和交错产生的序列,如图所示。此相关问题. 遗憾的是,这样一来,分离器的每次出现都会被识别两次。

也许更好的方法是在一个更原始的基础上使用 re-matcherre-find.

最后,为了更直接地回答最初的问题,Clojure的标准库或任何外部库中都没有这样的功能。此外,我也不知道有什么简单的、完全没有问题的方法来解决这个问题(尤其是使用regex-separator)。


更新

这是我现在能想到的最好的解决方案,在一个较低的层次上,懒洋洋地用一个regex-separator工作。

(defn re-partition [re s]
  (let [mr (re-matcher re s)]
    ((fn rec [i]
       (lazy-seq
         (if-let [m (re-find mr)]
           (list* (subs s i (.start mr)) m (rec (.end mr)))
           (list (subs s i)))))
     0)))

(def re-partition+ (comp (partial remove empty?) re-partition))

注意,我们可以(重新)定义:

(def re-split (comp (partial take-nth 2) re-partition))

(def re-seq (comp (partial take-nth 2) rest re-partition))
© www.soinside.com 2019 - 2024. All rights reserved.