我需要从 https://www.windwardstudios.com/Archive/23.X/23.3.0/JavaRESTfulEngine-23.3.0.32.zip 网址搜索 https://www.windwardstudios.com/version/使用 powershell 进行版本下载。
因此我需要
https:\\<anything>\JavaRESTfulEngine<anything>.zip
首先,我尝试了
$regexPattern = 'https://cdn\.windwardstudios\.com/Archive/\d{2}\.X/\d+\.\d+\.\d+/JavaRESTfulEngine-.*?\.zip'
,它有效并给了我所需的URL
为了进一步概括,我尝试了
$regexPattern = 'https://cdn\.windwardstudios\.com/Archive/([^/]+)/JavaRESTfulEngine-.*?\.zip'
但现在不起作用。
下面是我的powershell脚本。
# URL of the website to scrape
$websiteUrl = https://www.windwardstudios.com/version/version-downloads
# Use Invoke-WebRequest to fetch the web page content
$response = Invoke-WebRequest -Uri $websiteUrl
# Check if the request was successful
if ($response.StatusCode -eq 200) {
# Parse the HTML content to find the zip file URL using a regular expression
$htmlContent = $response.Content
$regexPattern = 'https://cdn\.windwardstudios\.com/Archive/([^/]+)/JavaRESTfulEngine-.*?\.zip'
$zipFileUrls = [regex]::Matches($htmlContent, $regexPattern) | ForEach-Object { $_.Value }
if ($zipFileUrls.Count -gt 0) {
Write-Host "Found zip file URLs:"
$zipFileUrls | ForEach-Object { Write-Host $_ }
} else {
Write-Host "Zip file URLs not found on the page."
}
} else {
Write-Host "Failed to fetch the web page. Status code: $($response.StatusCode)"
}
输出:
Zip file URLs not found on the page.
所需输出:
https://cdn.windwardstudios.com/Archive/23.X/23.3.0/JavaRESTfulEngine-23.3.0.32.zip
你能推荐一下吗?
你可以使用
https://cdn\.windwardstudios\.com/Archive/(\S+?)/JavaRESTfulEngine-.*?\.zip
请参阅 正则表达式演示。
详情:
https://cdn\.windwardstudios\.com/Archive/
- 文字 https://cdn.windwardstudios.com/Archive/
字符串(\S+?)
- 第 1 组:一个或多个尽可能少的非空白字符/JavaRESTfulEngine-
- 文字 /JavaRESTfulEngine-
字符串.*?
- 除换行符之外的任何零个或多个字符尽可能少\.zip
- .zip
字符串。