我正在从网站上抓取版本信息。我能够获取信息,但无法在不格式化的情况下获取信息。当前的目标是 ID 为 j_idt19 的 DIV 标记。有没有办法从 id 为 page_footer 的 td withing 表中获取信息。我无法通过文本找到特定的 TD。
我想将结果放入 csv 中,然后将文本作为 Num.NumNum.NumNumNum 放入文本文件中
# Retrieve the front page of Reddit
$response = Invoke-WebRequest -Uri "https://www.somesite.com/index.xhtml"
# Select the titles and URLs of the top stories
$results1 = $response.ParsedHtml.getElementsByTagName(“Div”) | Where-Object {$_.id -eq “j_idt19”} | Select-Object -Property TextContent
$results2 = $response.ParsedHtml.getElementsByTagName(“Div”) | Where-Object {$_.id -eq “j_idt19”} | Select-Object -Property TextContent | Out-String
Write-Output $results
$results1 | Export-Csv -Path “C:\Users\ASTRTW3\Desktop\David_Scripts\URL_TEST5.csv"
$results2 | Out-File -FilePath “C:\Users\ASTRTW3\Desktop\David_Scripts\URL_TEST5.txt"
Html 代码被抓取
<div id="j_idt19" class="ui-layout-unit ui-widget ui-widget-content ui-corner-all ui-layout-south ui-layout-pane ui-layout-pane-south" style="position: absolute; margin: 0px; inset: auto 5px 0px; width: auto; z-index: 0; height: 26px; display: block; visibility: visible;"><div class="ui-layout-unit-content ui-widget-content" style="position: relative; height: 22px; visibility: visible;">
<table id="page_footer" style="width: 100%; border-top: 1px solid #cbc3be !important;">
<tbody><tr>
<td style="width: 30%;">
</td>
<td style="width: 40%; text-align: center;"><span style="font-weight: bold;">1.14.012</span>
</td>
<td style="width: 15%; text-align: right;"> </td>
<td style="text-align: right; width: 20px; margin-top: 2px;"><div id="j_idt23" style="width:18px;height:18px;position:fixed;right:130px;bottom:2px"><div id="j_idt23_start" style="display:none"><img id="progressBar" src="/CSDB/resources/images/loader_footer.gif"></div><div id="j_idt23_complete" style="display:none"></div></div>
</td>
</tr>
</tbody></table></div></div>
csv 结果
#TYPE Selected.System.__ComObject
"textContent"
"
1.14.012
?
"
文字结果
textContent
-----------
...
预期结果
CSV
#TYPE Selected.System.__ComObject
"textContent"
1.14.012
文字
1.14.012
version
包含在 <span>
中的 <td>
中,在这种情况下,你可以使用的代码是:
$response.ParsedHtml.getElementById('j_idt19') | ForEach-Object {
$ver = $null
foreach ($td in $_.getElementsByTagName('td')) {
$td.getElementsByTagName('span') |
Where-Object { [version]::TryParse($_.textContent, [ref] $ver) } |
Select-Object textContent
}
} | Export-Csv path\to\csv