使用powershell在网页内容中查找url

问题描述 投票:0回答:1

我需要从 https://www.windwardstudios.com/Archive/23.X/23.3.0/JavaRESTfulEngine-23.3.0.32.zip 网址搜索 https://www.windwardstudios.com/version/使用 powershell 进行版本下载

因此我需要

https:\\<anything>\JavaRESTfulEngine<anything>.zip

首先,我尝试了

$regexPattern = 'https://cdn\.windwardstudios\.com/Archive/\d{2}\.X/\d+\.\d+\.\d+/JavaRESTfulEngine-.*?\.zip'
,它有效并给了我所需的URL

为了进一步概括,我尝试了

$regexPattern = 'https://cdn\.windwardstudios\.com/Archive/([^/]+)/JavaRESTfulEngine-.*?\.zip'
但现在不起作用。

下面是我的powershell脚本。

# URL of the website to scrape

$websiteUrl = https://www.windwardstudios.com/version/version-downloads

# Use Invoke-WebRequest to fetch the web page content

$response = Invoke-WebRequest -Uri $websiteUrl

# Check if the request was successful

if ($response.StatusCode -eq 200) {

    # Parse the HTML content to find the zip file URL using a regular expression

    $htmlContent = $response.Content

    $regexPattern = 'https://cdn\.windwardstudios\.com/Archive/([^/]+)/JavaRESTfulEngine-.*?\.zip'

    $zipFileUrls = [regex]::Matches($htmlContent, $regexPattern) | ForEach-Object { $_.Value }

    if ($zipFileUrls.Count -gt 0) {

        Write-Host "Found zip file URLs:"

        $zipFileUrls | ForEach-Object { Write-Host $_ }

    } else {

        Write-Host "Zip file URLs not found on the page."

    }

} else {

    Write-Host "Failed to fetch the web page. Status code: $($response.StatusCode)"

}

输出:

Zip file URLs not found on the page.

所需输出:

https://cdn.windwardstudios.com/Archive/23.X/23.3.0/JavaRESTfulEngine-23.3.0.32.zip

你能推荐一下吗?

regex powershell string-matching web-content
1个回答
0
投票

你可以使用

https://cdn\.windwardstudios\.com/Archive/(\S+?)/JavaRESTfulEngine-.*?\.zip

请参阅 正则表达式演示

详情

  • https://cdn\.windwardstudios\.com/Archive/
    - 文字
    https://cdn.windwardstudios.com/Archive/
    字符串
  • (\S+?)
    - 第 1 组:一个或多个尽可能少的非空白字符
  • /JavaRESTfulEngine-
    - 文字
    /JavaRESTfulEngine-
    字符串
  • .*?
    - 除换行符之外的任何零个或多个字符尽可能少
  • \.zip
    -
    .zip
    字符串。
© www.soinside.com 2019 - 2024. All rights reserved.