使用正则表达式/VBA获取每个以点结尾的单词

Question

在 Office 2019 Excel 电子表格中。我正在尝试从任何指定的单元格中提取最多 5 个以 ] 后的点结尾的单词。

示例文本：

一些文字[asred.] ost。 |周一 - 里本（终极）很多。交流，中国。其他可能是长文本；科兰.

我期望：

ost。终极。很多。华。科兰.

我在网上找到了这个功能：

Public Function RegExtract(Txt As String, Pattern As String) As String

With CreateObject("vbscript.regexp")
    '.Global = True
    .Pattern = Pattern
    If .test(Txt) Then
        RegExtract = .Execute(Txt)(0)
    Else
        RegExtract = "No match found"
    End If
End With

End Function

我从空单元格中调用它：

=RegExtract(D2; "([\]])(\s\w+[.]){0,5}")

我的表情：

([\]])(\s\w+[.]){0,5}

返回：

] OST。

我无法去掉 ] ，因为 \K 在 Excel 中不起作用，因此需要找到有用位在文本块内开始的位置。
我不明白迭代器如何工作来获得“最多 5 次出现”。
我预计第二组之后的 {0,5} 表示：重复前一组直到文本块末尾（或者直到您成功执行 5 次）。

--在 JdvD 接受记录答案后添加--

我正在使用此模式来获取在右括号出现 first 后以点结尾的所有单词。

^.*?\]|(\w+\.\s?)|.

这个（不带问号）会在 last 出现右括号之后获取所有以点结尾的单词。

^.*\]|(\w+\.\s?)|.

我的 regExtract 函数中遗漏了一些内容：我需要通过 for 循环将匹配项存储到数组中，然后将该数组作为字符串输出。我假设正则表达式引擎将匹配存储为唯一的字符串。

工作功能：

Public Function RegExtract(Txt As String, Pattern As String) As String

Dim rMatch As Object, arrayMatches(), i As Long

With CreateObject("vbscript.regexp")
    .Global = True
    .Pattern = Pattern
    If .Test(Txt) Then
        For Each rMatch In .Execute(Txt)
            If Not IsEmpty(rMatch.SubMatches(0)) Then
                ReDim Preserve arrayMatches(i)
                arrayMatches(i) = rMatch.SubMatches(0)
                i = i + 1
            End If
        Next
        RegExtract = Join(arrayMatches, " ")
    Else
        RegExtract = "No match found"
    End If
End With

End Function

Answer 1

正则表达式匹配：

除了@RonRosenfeld给出的答案之外，还可以应用一些人所说的“有史以来最好的正则表达式技巧”，这意味着首先匹配你不想要的，然后匹配你做的想要加入捕获组。例如：

^.*\]|(\w+\.)

查看在线演示，简而言之，这意味着：

```
^.*\]
```
- 匹配从字符串开头到最后出现的右方括号的 0+（贪婪）字符；
```
|
```
- 或；
```
(\w+\.)
```
- 捕获持有 1+（贪婪）以点结尾的单词字符的组。

以下是它在 UDF 中的工作方式：

Sub Test()

Dim s As String: s = "some text [asred.] ost. |Monday - Ribben (ult.) lot. ac, sino. other maybe long text; collan. "

Debug.Print RegExtract(s, "^.*\]|(\w+\.)")

End Sub

'------

'The above Sub would invoke the below function as an example.
'But you could also invoke this through: `=RegExtract(A1,"^.*\]|(\w+\.)")`
'on your sheet.

'------

Public Function RegExtract(Txt As String, Pattern As String) As String

Dim rMatch As Object, arrayMatches(), i As Long

With CreateObject("vbscript.regexp")
    .Global = True
    .Pattern = Pattern
    If .Test(Txt) Then
        For Each rMatch In .Execute(Txt)
            If Not IsEmpty(rMatch.SubMatches(0)) Then
                ReDim Preserve arrayMatches(i)
                arrayMatches(i) = rMatch.SubMatches(0)
                i = i + 1
            End If
        Next
        RegExtract = Join(arrayMatches, " ")
    Else
        RegExtract = "No match found"
    End If
End With

End Function

正则表达式替换：

根据您想要的输出，还可以使用替换功能。您必须将任何剩余字符与另一个替代字符相匹配。例如：

^.*\]|(\w+\.\s?)|.

查看在线演示，简而言之，这意味着我们添加了另一个替代方案，即任何单个字符。第二个小补充是我们在第二个替代方案中添加了可选空格字符

\s?

的选项。

Sub Test()

Dim s As String: s = "some text [asred.] ost. |Monday - Ribben (ult.) lot. ac, sino. other maybe long text; collan. "

Debug.Print RegReplace(s, "^.*\]|(\w+\.\s?)|.", "$1")

End Sub

'------

'There are now 3 parameters to parse to the UDF; String, Pattern and Replacement.

'------

Public Function RegReplace(Txt As String, Pattern As String, Replacement) As String

Dim rMatch As Object, arrayMatches(), i As Long

With CreateObject("vbscript.regexp")
    .Global = True
    .Pattern = Pattern
    RegReplace = Trim(.Replace(Txt, Replacement))
End With

End Function

请注意，我使用

Trim()

删除了可能的尾随空格。

RegexMatch 和 RegexReplace 目前都会返回单个字符串来清理输入，但前者确实为您提供了处理 arrayMatches() 变量中的数组的选项。

Answer 2

有一种方法可以返回字符串中以特定模式开始的所有匹配项。但现在想不起来了。

同时，最简单的方法似乎是删除第一个

之前的所有内容，然后将正则表达式应用于其余部分。

例如：

Option Explicit
Sub findit()
  Const str As String = "some text [asred.] ost. |Monday - Ribben (ult.) lot. ac, sino. other maybe long text; collan."
  Dim RE As RegExp, MC As MatchCollection, M As Match
  Dim S As String
  Dim sOutput As String
  
S = Mid(str, InStr(str, "]"))

Set RE = New RegExp
With RE
    .Pattern = "\w+(?=\.)"
    .Global = True
    If .Test(S) = True Then
        Set MC = .Execute(S)
        For Each M In MC
            sOutput = sOutput & vbLf & M
        Next M
    End If
End With


MsgBox Mid(sOutput, 2)

End Sub

您当然可以通过使用计数器而不是

For each

循环来将匹配数限制为 5

Answer 3

您可以使用以下正则表达式

([a-zA-Z]+)\.

让我解释一下。

[a-zA-Z]

- 这会查找包含从 a 到 z 和 A 到 Z 的任何字母的任何内容，但它只匹配第一个字母。

\+

- 告诉你匹配所有字母，直到找到不是从 a 到 z 和 A 到 Z 的字母

\.

- 有了这个，您只需寻找 .比赛结束时

这里是示例。

使用正则表达式/VBA获取每个以点结尾的单词

问题描述投票：0回答：3

3个回答

最新问题

使用正则表达式/VBA获取每个以点结尾的单词

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3