获取 HTML 片段的两个索引之间的子字符串

Question

在 powershell 中获取 URL 中的子字符串的最佳方法是什么？

鉴于

<a href="http://somehost.aa.com/something?id=12345">http://somehost.aa.com/something?id=12345</a>

我需要使用 powershell 将

提取为变量。我不知道该子字符串的长度是否会保持不变 - 所以它可能是 123 或者它可能是 12345689AABCD

我确实知道“”内的 URL 的 href 和链接文本（a>链接文本将是相同的 - 即链接。（如果重要的话）

我尝试使用

IndexOf

方法..似乎，但是..我不确定是否有办法获取两个索引之间的子字符串。

Answer 1

有一个使用 PowerShell 的 -match 运算符的快速而肮脏的

regex

解决方案。但是，请注意，基于正则表达式的 HTML 解析总是受到限制且脆弱，并且应仅限于简单的情况，例如：

$htmlFragment = '<a href="http://somehost.aa.com/something?id=12345">http://somehost.aa.com/something?id=12345</a>'

# -> '12345'
$id = 
  if ($htmlFragment -match '\?id=(\w+)') { 
    # Match found; output the value of the capture group that captured the ID.
    $Matches.1 
  }

一个正确、强大的 HTML 解析解决方案需要做更多的工作：

注：

下面的解决方案依赖于第三方
ConvertFrom-Html
```
 模块中的 
```
PSParseHTML
cmdlet，例如，您可以使用
```
Install-Module PSParseHtml
```
安装该模块。该模块默认使用 HAP（HTML Agility Pack）的 HTML 解析 API，如下所示。

# Needed for [System.Web.HttpUtility]::ParseQueryString()
Add-Type -AssemblyName System.Web

$htmlFragment = '<a href="http://somehost.aa.com/something?id=12345">http://somehost.aa.com/something?id=12345</a>'

# Parse the fragment into a HTML DOM.
$node = ConvertFrom-Html -Content $htmlFragment

# Extract the element's inner text, containing the URL.
$url = $node.InnerText

# Parse the URL and its query-string part, then extract the 'id' entry.
# -> '12345
$id = 
  [System.Web.HttpUtility]::ParseQueryString(
    ([uri] $url).Query
  ).Get('id')

获取 HTML 片段的两个索引之间的子字符串

问题描述投票：0回答：1

1个回答

最新问题

获取 HTML 片段的两个索引之间的子字符串

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1