在Outlook VBA正则表达式的效率

问题描述 投票:8回答:2

我是新来的Outlook VBA(Office 365中版)和我想实现的是循环通过所有的邮件文件夹中(的“[email protected]”,“收件箱”),移动电子邮件,其中标的相匹配的特定正则表达式到不同的文件夹中。

由于这是第一次,我使用的是Outlook VBA,和我不熟悉它的对象模型,我一直在试图拼凑出一个解决方案。

以下是我迄今为止(我学会了写简单的例子为组件步骤,然后建立到最终复合函数):

Sub RegExpMoveEmailToFolderSO()
    Dim MyFolder As Outlook.Folder
    Dim MyNS As NameSpace
    Dim MyEmail As Outlook.MailItem
    Dim MyItems As Outlook.Items
    Dim CountMatches As Integer
    Dim MySubject As String
    Dim MyRegExp As RegExp
    Dim MyDestinationFolder As Outlook.Folder


    Set MyNS = Application.GetNamespace("MAPI")
    Set MyFolder = MyNS.Folders("[email protected]").Folders("Inbox")
    Set MyDestinationFolder = MyNS.Folders("[email protected]").Folders("Inbox")
    Set MyItems = MyFolder.Items
    Set MyRegExp = New RegExp

    CountMatches = 1
    MyRegExp.Pattern = "(Reg).*(Exp)"

    For Each Item In MyItems
        MySubject = Item.Subject
        If MyRegExp.Test(MySubject) Then
              Item.Move MyDestinationFolder
              CountMatches = CountMatches + 1
        End If
    Next

    MsgBox "The total number of emails moved is: " & CountMatches & "."
End Sub

这表面上工作,但速度很慢 - 相当慢一点比说在Outlook类似的规则,我的i7处理器的机器上旋转起来的球迷。我想知道是否有任何明显的低效这个代码,是否有什么办法使这更有效和更处理器饿了。

regex vba outlook outlook-vba
2个回答
2
投票
  1. 是的,代码是极其低效的 - 你永远不应该通过一个文件夹中的所有项目环。使用Items.Find/FindNextItems.Restrict来完成这项工作。这些方法不支持正则表达式,但(如果你确实需要使用正则表达式),你应该至少使用这些方法来过滤了你的潜在匹配。

Microsoft docs的查询格式和示例。

  1. 另外请注意,您正在使用“每个”环路要修改的非常相同的集合(通过调用移动) - 这将导致你跳过一些项目。总是使用从Items.Count down to 1 step -1向下环(其中的项目是由Items.Restrict返回的返回优选 - 见#1)。

1
投票

我不是一个正则表达式的专家,所以我使用的测试工具来帮助我发展模式。我试图匹配您的模式,并针对一些符合你的拍摄对象的字符串的一些变化。我没想到之前时机不同的模式,但现在我已经补充说,作为一个选项,以我的测试工具。下面的结果并不如我所料。

Pattern        Text                   Duration

(Reg).*(Exp)   xxxRegyyyExpzzz        0.00000216
(Reg).*(Exp)   xxxxRegExpzzz          0.00000212
(Reg).*(Exp)   xxxxxRegyEyyExpzzz     0.00000220
(Reg).*(Exp)   xxxxxxRegyyExyExpzzz   0.00000220

Reg.*Exp       xxxRegyyyExpzzz        0.00000199
Reg.*Exp       xxxxRegExpzzz          0.00000198
Reg.*Exp       xxxxxRegyEyyExpzzz     0.00000204
Reg.*Exp       xxxxxxRegyyExyExpzzz   0.00000205

Reg.*?Exp      xxxRegyyyExpzzz        0.00000205
Reg.*?Exp      xxxxRegExpzzz          0.00000188
Reg.*?Exp      xxxxxRegyEyyExpzzz     0.00000214
Reg.*?Exp      xxxxxxRegyyExyExpzzz   0.00000220

时序VBA程序是困难的,因为背景解释器和OS程序可以显著影响时序。我不得不增加重复次数1000万前的总持续时间,就足以让我考虑的平均期限可靠。

正如你所看到卸下捕获括号节省一点时间,虽然你将需要成千上万的电子邮件,你会发现之前。 “注册”和“EXP”之间的字符只有数似乎没有什么效果。

我不明白为什么前两种模式工作。 .*被认为是贪婪。应该每个字符匹配的字符串或下一个换行的末尾。该模式不应该找到“EXP”,因为它们相匹配的.*。只有懒.*?应该已经停止匹配字符的时候才发现“EXP”。无论是我误解了贪婪与懒惰匹配或VBA正则表达式引擎不把.*贪婪。

我的结论是,正则表达式匹配不是你的日常速度缓慢的原因。我建议你尝试添的建议。 IAmANerd2000添加常规展示蒂姆的建议,但他/她已经因为删除了它。 (我可以看到已删除的答案,因为我的名声超过10K)。也许蒂姆想补充一个答案展示了他的建议。

我举出以下的情况下,你发现它有助于我的测试工具。每个图案和文字,其输出是:

===========================================
   Pattern: "(Reg).*(Exp)"
      Text: "xxxRegyyyExpzzz"
Av Durat'n: 0.00000216
-------------------------------------------
     Match: 1
     Value: "RegyyyExp"
    Length: 9
FirstIndex: 3
  SubMatch: 1 "Reg"
  SubMatch: 2 "Exp"
===========================================

Option Explicit
Sub Test2()

  Dim Patterns As Variant
  Dim Texts As Variant

  Texts = Array("xxxRegyyyExpzzz", _
                "xxxxRegExpzzz", _
                "xxxxxRegyEyyExpzzz", _
                "xxxxxxRegyyExyExpzzz")

  Patterns = Array("(Reg).*(Exp)", _
                   "Reg.*Exp", _
                   "Reg.*?Exp")

  Call TestCapture(Patterns, Texts, True)

End Sub
Sub TestCapture(ByRef Patterns As Variant, ByRef Texts As Variant, _
                Optional ByVal TimeDuration As Boolean = False)

  ' Patterns      an array of patterns to be tested
  ' Texts         an array of text to be matched against the patterns
  ' TimeDuration  if True, record the average duration of the match

  ' Attempts to match each text against each pattern and reports on the result
  ' If TimeDuration is True, repeats the match 10,000,000 times and reports the
  ' average duration so the efficiency of different patterns can be determined

  Dim CountCrnt As Long
  Dim CountMax As Long
  Dim InxM As Long
  Dim InxS As Long
  Dim Matches As MatchCollection
  Dim PatternCrnt As Variant
  Dim RegEx As New RegExp
  Dim TimeEnd As Double
  Dim TimeStart As Double
  Dim SubMatchCrnt As Variant
  Dim TextCrnt As Variant

  With RegEx
    .Global = True         ' Find all matches
    .MultiLine = False     ' Match cannot extend across linebreak
    .IgnoreCase = True

    For Each PatternCrnt In Patterns
     .Pattern = PatternCrnt

      For Each TextCrnt In Texts
        Debug.Print "==========================================="
        Debug.Print "   Pattern: """ & PatternCrnt & """"
        Debug.Print "      Text: """ & TidyTextForDspl(TextCrnt) & """"
        If TimeDuration Then
          CountMax = 10000000
          TimeStart = Timer
        Else
          CountMax = 1
        End If
        For CountCrnt = 1 To CountMax
          If Not .test(TextCrnt) Then
            Debug.Print Space(12) & "Text does not match pattern"
            Exit For
          Else
            Set Matches = .Execute(TextCrnt)
            If CountCrnt = CountMax Then
              TimeEnd = Timer
              If TimeDuration Then
                Debug.Print "Av Durat'n: " & Format((TimeEnd - TimeStart) / CountMax, "0.00000000")
              End If
              If Matches.Count = 0 Then
                Debug.Print Space(12) & "Match but no captures"
              Else
                For InxM = 0 To Matches.Count - 1
                  Debug.Print "-------------------------------------------"
                  With Matches(InxM)
                    Debug.Print "     Match: " & InxM + 1
                    Debug.Print "     Value: """ & TidyTextForDspl(.Value) & """"
                    Debug.Print "    Length: " & .Length
                    Debug.Print "FirstIndex: " & .FirstIndex
                    For InxS = 0 To .SubMatches.Count - 1
                      Debug.Print "  SubMatch: " & InxS + 1 & " """ & _
                                  TidyTextForDspl(.SubMatches(InxS)) & """"
                    Next
                  End With
                Next InxM
              End If
            End If
          End If
        Next CountCrnt
      Next TextCrnt
    Next PatternCrnt
    Debug.Print "==========================================="

  End With

End Sub
Public Function TidyTextForDspl(ByVal Text As String) As String

  ' Tidy Text for dsplay by replacing white space with visible strings:
  '   Replace spaces by          ‹s› or ‹n s›
  '   Replace line feed by       ‹lf› or ‹n lf›
  '   Replace carriage return by ‹cr› or ‹n cr›
  '   Replace tab by             ‹tb› or ‹n tb›
  '   Replace non-break space by ‹nbs› or {n nbs›
  ' Where n is a count if the character repeats

  ' 15Mar16  Coded
  '  3Feb19  Replaced "{" (\x7B) and "}" (\x7D) by "‹" (\u2039) and "›" (\u203A)
  '          on the grounds that the angle quotation marks were not likely to
  '          appear in text to be displayed.

  Dim InsStr As String
  Dim InxWsChar As Long
  Dim NumWsChar As Long
  Dim PosWsChar As Long
  Dim RetnVal As String
  Dim WsCharCrnt As Variant
  Dim WsCharValue As Variant
  Dim WsCharDspl As Variant

  WsCharValue = Array(" ", vbLf, vbCr, vbTab, Chr(160))
  WsCharDspl = Array("s", "lf", "cr", "tb", "nbs")

  RetnVal = Text
  For InxWsChar = LBound(WsCharValue) To UBound(WsCharValue)
    Do While True
      PosWsChar = InStr(1, RetnVal, WsCharValue(InxWsChar))
      If PosWsChar = 0 Then
        Exit Do
      End If
      NumWsChar = 1
      Do While Mid(RetnVal, PosWsChar + NumWsChar, 1) = WsCharValue(InxWsChar)
        NumWsChar = NumWsChar + 1
      Loop
      If NumWsChar = 1 Then
        InsStr = "‹" & WsCharDspl(InxWsChar) & "›"
      Else
        InsStr = "‹" & NumWsChar & WsCharDspl(InxWsChar) & "›"
      End If
      RetnVal = Mid(RetnVal, 1, PosWsChar - 1) & InsStr & Mid(RetnVal, PosWsChar + NumWsChar)
    Loop
  Next

  TidyTextForDspl = RetnVal

End Function
© www.soinside.com 2019 - 2024. All rights reserved.