我有当前代码以获取标题和图片。标题位于一个文本框中,图片位于一个图片框中。
在我的Windows窗体中,我有:
Imports System
Imports System.Xml
Imports HtmlAgilityPack
Imports System.Net
Imports System.IO
Imports System.Collections.Generic
在测试的加载页面中,我有:
Public Class scrapper
Private Sub scrapper_Load(sender As Object, e As EventArgs) Handles MyBase.Load
'Enable SSL Suppport'
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12
'WebPage to Scrapping'
Dim link As String = "https://www.nextinpact.com/"
'download page from the link into an HtmlDocument'
Dim doc As HtmlDocument = New HtmlWeb().Load(link)
'select the title'
Dim div As HtmlNode = doc.DocumentNode.SelectSingleNode("/html/body/div[1]/div[2]/section/aside/section/div[2]/div/article[1]/div/div/h3/a")
'select the image'
Dim img As HtmlNode = doc.DocumentNode.SelectSingleNode("/html/body/div[1]/div[2]/div/div[1]/div[5]/div/div[2]/p[1]/a/img")
If Not div Is Nothing Then
TextBox1.Text = div.InnerText.Trim()
End If
If Not img Is Nothing Then
'PictureBox1.Load(img.OuterHtml.Trim())
End If
'Test Picturebox2
PictureBox2.Load("https://cdn2.nextinpact.com/compress/100-76//images/bd/square-linked-media/23647.jpg")
End Sub
End Class
但是在PictureBox1中我无法获取图片。
在Picture2中,仅用于测试。
如何正确获取Picturebox1的图片?
然后,只需从元素获取URL。
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim doc As HtmlDocument = New HtmlWeb().Load("https://www.nextinpact.com/")
PictureBox1.LoadAsync(doc.DocumentNode.SelectSingleNode("//div[@id='list_news']//img").Attributes("data-frz-src").Value)
End Sub
类似。
如果您尝试提取与PictureBox2中显示的图像相同的图像,则第二个SelectSingleNode上的XPath不正确。我会改用这些:
'select the title'
Dim div As HtmlNode = doc.DocumentNode.SelectSingleNode("//aside[@id='sideBarIndex']//article//div/div/h3/a")
'select the image'
Dim img As HtmlNode = doc.DocumentNode.SelectSingleNode("//aside[@id='sideBarIndex']//article//img")