从电子邮件正文中删除包含CRLF字符的字符串

问题描述 投票:0回答:2

我正在尝试从选定的传入MS Outlook(2016)电子邮件中删除字符串。

字符串是德语中的两个句子。我使用Replace()函数。这主要是有效的。 (请参阅下面的完整过程。)

这两个句子有时用CRLF(回车,换行)字符分隔,并且它们并不总是在同一位置。这似乎是这些电子邮件在进入我的Outlook收件箱之前通过各种设备传递的结果。

首先解决问题的较简单部分

在解决CRLF在不同位置出现的问题之前,我想创建一个处理CRLF在固定位置的字符串的过程。

这样的字符串的源代码看起来如何:enter image description here

((屏幕快照历史记录:我将电子邮件另存为.html到硬盘上,然后在Notepad ++中打开.html文件,以查看CRLF字符。)

html标记与我无关。它们可以保留在电子邮件中。 (实际上,格式标签也有所不同,因此最好不要完全处理它们。)我唯一关心的是删除可见部分,即文本“ Diese E-Mail kommt ...vertrauenswürdighalten”。

我试图通过将CR LF部分包含为Chr()来捕获带有换行符的文本:

strDelete01 = "Diese E-Mail kommt von Personen" & Chr(13) & Chr(10) & "außerhalb der Stadtverwaltung. Klicken Sie nur auf Links oder Dateianhnge," & Chr(13) & Chr(10) & "wenn Sie die Personenn für vertrauenswürdig halten." 

我的过程无法识别字符串,因此什么也不做。

到目前为止,我的脚本

Public Sub EditBodyCgReplace()

'Declarations
   Dim obj As Object
   Dim Sel As Outlook.Selection
   Dim DoSave As Boolean
   Dim NewBody As String
   Dim strDelete01 As String
   Dim strDelete02 As String
   Dim strDelete03 As String
   Dim strDelete04 As String

'Fill the variables 
   strDelete01 = "Diese E-Mail kommt von Personen außerhalb der Stadtverwaltung. Klicken Sie nur auf Links oder Dateianhänge, wenn Sie die Personen für vertrauenswürdig halten."
   strDelete02 = "################################################################################"
   strDelete03 = <hr>
   strDelete04 = "Diese E-Mail kommt von Personen" & Chr(13) & Chr(10) & "außerhalb der Stadtverwaltung. Klicken Sie nur auf Links oder Dateianhnge," & Chr(13) & Chr(10) & "wenn Sie die Personenn für vertrauenswürdig halten."

'Note: I am playing here with various types of strings at once. For example, 
'the procedure will also remove <hr> lines and "#####" strings  

'Work with it 
    If TypeOf Application.ActiveWindow Is Outlook.Inspector Then
        Set obj = Application.ActiveInspector.CurrentItem
    Else
        Set Sel = Application.ActiveExplorer.Selection
        If Sel.Count Then
            Set obj = Sel(1)
            DoSave = True
        End If
    End If

    If Not obj Is Nothing Then
        NewBody = Replace(obj.HTMLBody, strDelete01, "")
        NewBody = Replace(obj.HTMLBody, strDelete02, "")
        NewBody = Replace(obj.HTMLBody, strDelete03, "")
        NewBody = Replace(obj.HTMLBody, strDelete04, "")

        If NewBody <> "" Then
            obj.HTMLBody = NewBody
            If DoSave Then
                obj.Save
            End If
        End If
    End If
    End Sub

问题:如何在搜索字符串中包含CRLF?

后续问题:我该怎么做才能删除变化的地方中包含的带有CRLF的字符串?有没有办法使用正则表达式? Outlook中的VBA可以处理吗? -想法:如果使用正则表达式,则整个CRLF问题都不再是问题,因为表达式看起来类似于

"Diese E-Mail kommt von * vertrauenswürdig halten."

并因此在中间包含任何内容-包括CRLF?

也许重要

经过各种实验后,我开始感觉到MS Outlook在其电子邮件中根本没有使用HTML?

[我观察到我几乎无法解决obj.HTMLBody中的任何html代码。我可以处理纯文本。我无法处理html的某些部分,例如“


”并删除它,但是我无法重新创建昨天工作的条件。)

我可以保存电子邮件as html文件(Outlook外部,在我硬盘上的单独文件夹中的某个位置),在这些文件中,我确实看到了CRLF和其他内容。但是,也许只要保留在Outlook本身中的电子邮件是使用其他代码存储的?

那么这是什么代码,我该如何解决要删除的部分代码?

outlook-vba
2个回答
1
投票

由于有空,我会部分回答您的问题。在我这样做之前,其他人可能会变得很重要。

我已编辑您的问题。我不明白几个句子,因此我查看了资料来源,发现我的猜想是正确的,您所输入的字数少于字符。堆栈溢出允许有限数量的Html标签。看起来像Html标签的所有其他内容都将被忽略。我用“&lt;”替换了每个“

您有:

NewBody = Replace(obj.HTMLBody, strDelete01, "")
NewBody = Replace(obj.HTMLBody, strDelete02, "")
NewBody = Replace(obj.HTMLBody, strDelete03, "")
NewBody = Replace(obj.HTMLBody, strDelete04, "")
If NewBody <> "" Then

每个Replace(第一个除外)将覆盖前一个NewBody创建的Replace的值。您似乎认为如果找不到strDelete04,NewBody将为空。否,如果找不到strDelete04,则NewBody将是obj.HTMLBody的副本。

您需要类似的东西:

NewBody = Replace(obj.HTMLBody, strDelete01, "")
NewBody = Replace(NewBody, strDelete02, "")
NewBody = Replace(NewBody, strDelete03, "")
NewBody = Replace(NewBody, strDelete04, "")
If NewBody <> obj.HTMLBody Then
  ' One or more delete strings found and removed

您说CRLF不在固定位置。如果是这样,您对代码的任何简单修改都不会达到您想要的效果。我将向您展示如何实现您想要的效果,但是首先我必须创建一些包含您的文本的电子邮件,以便我可以测试我的代码。

第2部分

更仔细地查看了您的HTML图像,我相信有一个简单的解决方案。文本中的两个CRLF替换空格。只要总是发生这种情况,可以使用:

NewBody = Replace(obj.HTMLBody, vbCr & vbLf, " ")

这将删除任何出现在HTML中的CRLF。显示文档时,HTML文档中的任何空格字符字符串(包括CR和LF)都将被单个空格替换,因此是否有多余的CRLF也没关系。

您使用以下方法完成了删除不需要的文本的操作:

Dim strDelete = "Diese E-Mail kommt von Personen außerhalb " & _
                "der Stadtverwaltung. Klicken Sie nur auf " & _
                "Links oder Dateianhänge, wenn Sie die Personen " & _
                "für vertrauenswürdig halten."

NewBody = Replace(NewBody, strDelete, "")

如果上述方法不起作用,则需要更方便的诊断技术。将整个电子邮件另存为HTML可能很容易,但是您不能完全确定结果与VBA宏所看到的有何不同。您想知道Outlook是否以Html以外的格式存储电子邮件。我无法想象为什么Outlook会将传入的SMTP消息转换为某种秘密格式,然后在用户希望查看时将其转换回原来的格式。如果Outlook确实具有秘密格式,则VBA程序员完全将其隐藏。

以下是我使用的诊断工具的简单版本。如果您需要更高级的功能,我可以提供,但让我们先尝试一下。

将下面的代码复制到Outlook模块。选择这些电子邮件之一,然后运行宏DsplHtmlBodyFromSelectedEmails。电子邮件的整个HTML正文将以可读格式输出到立即窗口。我相信我已经包含了宏调用的所有子程序。如果没有的话,我提前致歉。如果您收到有关未定义例程的消息,请告诉我,我将其添加到答案中。

Sub DsplHtmlBodyFromSelectedEmails()

  ' Select one or emails then run this macro.  For each selected email, the Received Time, the Subject and the Html body are output to the Immediate Window.  Note: the Immediate Window can only display about 200 lines before
The older lines are lost.

  Dim Exp As Explorer
  Dim Html As String
  Dim ItemCrnt As MailItem

  Set Exp = Outlook.Application.ActiveExplorer

  If Exp.Selection.Count = 0 Then
    Call MsgBox("Please select one or more emails then try again", vbOKOnly)
    Exit Sub
  Else
    For Each ItemCrnt In Exp.Selection
      With ItemCrnt
        If .Class = olMail Then
          Debug.Print .ReceivedTime & " " & .Subject
          Call OutLongTextRtn(Html, "Html", .HtmlBody)
          Debug.Print Html
        End If
      End With
    Next
  End If

End Sub
Sub OutLongTextRtn(ByRef TextOut As String, ByVal Head As String, _
                          ByVal TextIn As String)

  ' * Break TextIn into lines of not more than 100 characters
  '   and append to TextOut.
  ' * The output is arranged so:
  '     xxxx|sssssssssssssss|
  '         |sssssssssssssss|
  '         |ssssssssss|
  '   where "xxxx" is the value of Head and "ssss..." are characters from
  '         TextIn.  The third line in the example could be shorter because:
  '           * it contains the last few characters of TextIn
  '           * there a linefeed in TextIn
  '           * a <xxx> string recording whitespace would have been split
  '             across two lines.

  If TextIn = "" Then
    ' Nothing to do
    Exit Sub
  End If

  Const LenLineMax As Long = 100

  Dim PosBrktEnd As Long     ' Last > before PosEnd
  Dim PosBrktStart As Long   ' Last < before PosEnd
  Dim PosNext As Long        ' Start of block to be output after current block
  Dim PosStart As Long       ' First character of TextIn not yet output

  TextIn = TidyTextForDspl(TextIn)
  TextIn = Replace(TextIn, "lf›", "lf›" & vbLf)

  PosStart = 1
  Do While True
    PosNext = InStr(PosStart, TextIn, vbLf)
    If PosNext = 0 Then
      ' No LF in [Remaining] TextIn
      'Debug.Assert False
      PosNext = Len(TextIn) + 1
    End If
    If PosNext - PosStart > LenLineMax Then
      PosNext = PosStart + LenLineMax
    End If
    ' Check for <xxx> being split across lines
    PosBrktStart = InStrRev(TextIn, "‹", PosNext - 1)
    PosBrktEnd = InStrRev(TextIn, "›", PosNext - 1)
    If PosBrktStart < PosStart And PosBrktEnd < PosStart Then
      ' No <xxx> within text to be displayed
      ' No change to PosNext
      'Debug.Assert False
    ElseIf PosBrktStart > 0 And PosBrktEnd > 0 And PosBrktEnd > PosBrktStart Then
      ' Last or only <xxx> totally within text to be displayed
      ' No change to PosNext
      'Debug.Assert False
    ElseIf PosBrktStart > 0 And _
           (PosBrktEnd = 0 Or (PosBrktEnd > 0 And PosBrktEnd < PosBrktStart)) Then
      ' Last or only <xxx> will be split across rows
      'Debug.Assert False
      PosNext = PosBrktStart
    Else
      ' Are there other combinations?
      Debug.Assert False
    End If

    'Debug.Assert Right$(Mid$(TextIn, PosStart, PosNext - PosStart), 1) <> "‹"

    If TextOut <> "" Then
      TextOut = TextOut & vbLf
    End If
    If PosStart = 1 Then
      TextOut = TextOut & Head & "|"
    Else
      TextOut = TextOut & Space(Len(Head)) & "|"
    End If
    TextOut = TextOut & Mid$(TextIn, PosStart, PosNext - PosStart) & "|"
    PosStart = PosNext
    If Mid$(TextIn, PosStart, 1) = vbLf Then
      PosStart = PosStart + 1
    End If
    If PosStart > Len(TextIn) Then
      Exit Do
    End If
  Loop

End Sub
Function TidyTextForDspl(ByVal Text As String) As String

  ' Tidy Text for display by replacing white space with visible strings:
  '   Leave single space unchanged
  '   Replace single LF by                 ‹lf›
  '   Replace single CR by                 ‹cr›
  '   Replace single TB by                 ‹tb›
  '   Replace single non-break space by    ‹nbs›
  '   Replace single CRLF by               ‹crlf›
  '   Replace multiple spaces by           ‹n s›       where n is number of repeats
  '   Replace multiple LFs by              ‹n lf›      of white space character
  '   Replace multiple CRs by ‹cr› or      ‹n cr›
  '   Replace multiple TBs by              ‹n tb›
  '   Replace multiple non-break spaces by ‹n nbs›
  '   Replace multiple CRLFs by            ‹n crlf›

  Dim InsStr As String
  Dim InxWsChar As Long
  Dim NumWsChar As Long
  Dim PosWsChar As Long
  Dim RetnVal As String
  Dim WsCharCrnt As Variant
  Dim WsCharValue As Variant
  Dim WsCharDspl As Variant

  WsCharValue = VBA.Array(" ", vbCr & vbLf, vbLf, vbCr, vbTab, Chr(160))
  WsCharDspl = VBA.Array("s", "crlf", "lf", "cr", "tb", "nbs")

  RetnVal = Text

  ' Replace each whitespace individually
  For InxWsChar = 0 To UBound(WsCharValue)
    RetnVal = Replace(RetnVal, WsCharValue(InxWsChar), "‹" & WsCharDspl(InxWsChar) & "›")
  Next

  ' Look for repeats. If found replace <x> by <n x>
  For InxWsChar = 0 To UBound(WsCharValue)
    'Debug.Assert InxWsChar <> 1
    PosWsChar = 1
    Do While True
      InsStr = "‹" & WsCharDspl(InxWsChar) & "›"
      PosWsChar = InStr(PosWsChar, RetnVal, InsStr & InsStr)
      If PosWsChar = 0 Then
        ' No [more] repeats of this <x>
        Exit Do
      End If
      ' Have <x><x>.  Count number of extra <x>s
      NumWsChar = 2
      Do While Mid(RetnVal, PosWsChar + NumWsChar * Len(InsStr), Len(InsStr)) = InsStr
        NumWsChar = NumWsChar + 1
      Loop
      RetnVal = Mid(RetnVal, 1, PosWsChar - 1) & _
                "‹" & NumWsChar & " " & WsCharDspl(InxWsChar) & "›" & _
                Mid(RetnVal, PosWsChar + NumWsChar * Len(InsStr))
      PosWsChar = PosWsChar + Len(InsStr) + Len(NumWsChar)

    Loop
  Next

  ' Restore any single spaces
  RetnVal = Replace(RetnVal, "‹" & WsCharDspl(0) & "›", " ")

  TidyTextForDspl = RetnVal

End Function

1
投票

我的完整诊断程序

© www.soinside.com 2019 - 2024. All rights reserved.