我要进行网络爬网,但是某些项目被爬网,但是某些项目没有被爬网。我不知道原因

问题描述 投票:1回答:1

页面上的某些元素不可检索。我如何爬行?

已抓取的项目(2):地址,地球

1个未爬网项目:点

“ Points = soup.select('。Addr_point')”结尾处的这一部分无法抓取。我不知道原因(在红色虚线框中)。

请告知。

import urllib.parse
from bs4 import BeautifulSoup
import re

url = 'http://www.dooinauction.com/auction/ca_list.php'

req = urllib.request.Request(url) #
html = urllib.request.urlopen(req).read()
soup = BeautifulSoup(html, 'html.parser') 

tots = soup.select('div.title_left font') #total
tot = int(re.findall('\d+', tots[0].text)[0]) 
print(f'total : {tot}건')

url = f'http://www.dooinauction.com/auction/ca_list.php?total_record={tot}&search_fm_off=1&search_fm_off=1&start=0'
html = urllib.request.urlopen(url).read()[enter image description here][1]
soup = BeautifulSoup(html, 'html.parser')

addrs = soup.select('.addr')  # crawling OK
a_earths = soup.select('.list_class.bold') #crawling OK
points = soup.select('.addr_point') #crawling NO
print()

enter image description here

python web-crawler
1个回答
0
投票

我浏览了您的网站,似乎看不到addr_points部分。我想也许这就是原因。screenshot

© www.soinside.com 2019 - 2024. All rights reserved.