使用VBA将带有HTML标记的文本渲染到Word表格中的格式化文本

Question

我有一个带有html标签的word文档，我需要将其转换为格式化文本。例如，我希望<strong>Hello</strong>显示为Hello。

我之前从未使用过VBA，但我一直试图拼凑一些东西，这样我就可以从Word中的特定表格单元格中复制html文本，使用IE显示该文本的格式化版本，复制格式化文本从IE，然后将其粘贴回相同的Word表格单元格。我想我已经能够弄清楚一些代码，但我认为我并没有正确地指代表格单元格。有人可以帮忙吗？这是我到目前为止：

Dim Ie As Object

Set Ie = CreateObject("InternetExplorer.Application")

With Ie
    .Visible = False

    .Navigate "about:blank"

    .Document.body.InnerHTML = ActiveDocument.Tables(1).Cell(2, 2)

    .Document.execCommand "SelectAll"

    .Document.execCommand "Copy"

    ActiveDocument.Paste Destination = ActiveDocument.Tables(1).Cell(2, 2)

    .Quit
End With

结束子

Answer 1

对于.cell（2,2）的两种用途，您需要两种不同的方法。

要从单元格中获取文本，您需要修改要读取的第一行

.Document.body.InnerHTML = ActiveDocument.Tables(1).Cell(2, 2).range.text

在第二种情况下，您的术语不正确。它应该读

ActiveDocument.Tables(1).Cell(2, 2).range.paste

您可以非常轻松地获得有关各个关键字/属性的帮助。在VBA IDE中，只需将光标放在关键字/属性上，然后按F1。您将进入关键字/属性的MS帮助页面。有时，如果有多个替代方案，您将有一个额外的选择步骤。

您还应该知道属性.cell（行，列）容易失败，因为它依赖于表中没有合并的单元格。更强大的方法是使用.cells（index）属性。

可能是您可以使用替代appraoch并使用通配符搜索来查找标记，然后在应用合适的链接样式时替换所需的部分（您将无法使用段落样式，因为您将尝试格式化只有部分段落和字符样式似乎不适用于查找/替换）。

下面是删除HTML标记和格式化剩余文本的此类代码的示例

Option Explicit

Sub replaceHTML_WithFormattedText()

' a comma seperated list of HTML tags
Const myTagsList                          As String = "strong,small,b,i,em"

' a list of linked styles chosen or designed for each tag
' Paragraph  styles cannot be used as we are replacing only part of a paragraph
' Character styles just don't seem to work
' The linked styles below were just chosen from the default Word styles as an example
Const myStylesList                        As String = "Heading 1,Heading 9,Comment Subject,Intense Quote,Message Header"

' <, > and / are special characters therefore need escaping with '\' to get the actual character
Const myFindTag                           As String = "(\<Tag\>)(*)(\<\/Tag\>)"
Const myReplaceStr                        As String = "\2"

Dim myTagsHTML()                        As String
Dim myTagsStyles()                      As String
Dim myIndex                             As Long

    myTagsHTML = Split(myTagsList, ",")
    myTagsStyles = Split(myStylesList, ",")

    If UBound(myTagsHTML) <> UBound(myTagsStyles) Then
        MsgBox "Different number of tags and Styles", vbOKOnly
        Exit Sub

    End If

    For myIndex = 0 To UBound(myTagsHTML)

        With ActiveDocument.StoryRanges(wdMainTextStory).Find
            .ClearFormatting
            .Format = True
            .Text = Replace(myFindTag, "Tag", Trim(myTagsHTML(myIndex)))
            .MatchWildcards = True
            .Replacement.Text = myReplaceStr
            .Replacement.Style = myTagsStyles(myIndex)
            .Execute Replace:=wdReplaceAll

        End With

    Next

End Sub

Answer 2

尝试以下方面的事情：

Sub ReformatHTML()
Application.ScreenUpdating = False
With ActiveDocument.Range.Find
  .ClearFormatting
  .Format = True
  .Forward = True
  .MatchWildcards = True
  .Wrap = wdFindContinue
  .Replacement.Text = "\2"
  .Replacement.ClearFormatting
  .Text = "\<(u\>)(*)\</\1"
  .Replacement.Font.Underline = True
  .Execute Replace:=wdReplaceAll
  .Replacement.ClearFormatting
  .Text = "\<(b\>)(*)\</\1"
  .Replacement.Font.Bold = True
  .Execute Replace:=wdReplaceAll
  .Replacement.ClearFormatting
  .Text = "\<(i\>)(*)\</\1"
  .Replacement.Font.Italic = True
  .Execute Replace:=wdReplaceAll
  .Replacement.ClearFormatting
  .Text = "\<(h\>)(*)\</\1"
  .Replacement.Highlight = True
  .Execute Replace:=wdReplaceAll
End With
Application.ScreenUpdating = True
End Sub

上面的宏使用“普通”HTML代码进行粗体，斜体，下划线和突出显示。

由于您的文档似乎使用了不同的约定（样式名称，也许？），您可以使用（strong>）替换代码中的（b>）。而且，如果它打算与Word自己的“强”风格相关，你也会改变：

.Replacement.Font.Bold = True

至：

.Replacement.Style = "Strong"

使用VBA将带有HTML标记的文本渲染到Word表格中的格式化文本

问题描述投票：2回答：2

2个回答

最新问题

使用VBA将带有HTML标记的文本渲染到Word表格中的格式化文本

问题描述 投票：2回答：2

2个回答

最新问题

问题描述投票：2回答：2