尽管调用了该类仍无法从div获得href

问题描述 投票:0回答:1

我正在尝试获得此网站上所有产品的链接:https://www.officeworks.com.au/shop/officeworks/c/technology/audio-speakers/voice-assistant-speakers

例如,对于Google Home Mini粉笔,我应该得到https://www.officeworks.com.au/shop/officeworks/p/google-home-mini-chalk-sygminiwe

但是,我什至无法进入href链接之前的div类。我尝试了不同的代码,都使用bs4。这是我确定可以使用的两个代码,但是没有:

第一个代码

from bs4 import BeautifulSoup
from urllib.request import Request, urlopen

url_products = []
url = "https://www.officeworks.com.au/shop/officeworks/c/technology/audio-speakers/voice-assistant-speakers"
req = Request(url)
html_page = urlopen(req)
soup = BeautifulSoup(html_page, "lxml")
data = soup.find_all('div', {'class': 'ProductTile__ProductImageWrapper-sc-1dlojg1-2 gRQAGx'})
for div in data:
    links = div.find_all('a')
    for a in links:
        print('https://www.officeworks.com.au/' + a['href'])
        url_products.append('https://www.officeworks.com.au/' + a['href'])

第二代码

from bs4 import BeautifulSoup
import requests

r = requests.get('https://www.officeworks.com.au/shop/officeworks/c/technology/audio-speakers/voice-assistant-speakers')
soup = BeautifulSoup(r.content, 'lxml')
links = [item['href'] for item in soup.select('.gRQAGx > a')]

我相信我没有给合适的学生上课,但我无法弄清楚它是什么。预先感谢!

python-3.x web-scraping beautifulsoup href
1个回答
1
投票

由于页面通过JavaScript加载而无法获得预期输出的原因,因此,在您将render设为JS之前,您将无法提取预期输出。

所以您可以使用Selenium,但我不建议您这样做,因为它会减慢您的工作速度。

或使用HTMLSession中的requests_html进行实时渲染。

否则,我们只使用从其JS渲染的API的原点。

XHR下通过Network-Tab跟踪Browser Developer tools请求后 SHIFT C0]的[E

所以我们可以在这里打电话:

FireFox

输出:

import requests json = {"requests": [{"indexName": "prod-product-wc-bestmatch-personal", "params": "query=&hitsPerPage=24&maxValuesPerFacet=10&page=0&highlightPreTag=%3Cais-highlight-0000000000%3E&highlightPostTag=%3C%2Fais-highlight-0000000000%3E&clickAnalytics=true&optionalFilters=%5B%5D&sumOrFiltersScores=true&filters=(categorySeoPaths%3A%22technology%2Faudio-speakers%2Fvoice-assistant-speakers%22)&facets=%5B%22rangedOnline%22%2C%22forestProductSchemeName%22%2C%22hardDriveType%22%2C%22bagStyle%22%2C%22socketType%22%2C%22fullSizeInnerDimensions%22%2C%22stapleSize%22%2C%22connectivity%22%2C%22smartHomeCompatibility%22%2C%22industryType%22%2C%22sizeCapacity%22%2C%22performancePrintResolution%22%2C%22handsetIncludedHandsets%22%2C%22usbFlashLidType%22%2C%22videoResolution%22%2C%22maximumPunchingCapacity%22%2C%22rangedRetail%22%2C%22protectionType%22%2C%22rulerLength%22%2C%22sizeNumber%22%2C%22deviceConnectivityTechnology%22%2C%22unitsOfMeasure%22%2C%22selfAdhesive%22%2C%22interfaceHardDrive%22%2C%22sharpenerSize%22%2C%22connectivityWifiBands%22%2C%22microphoneType%22%2C%22labellerKeyboardLayout%22%2C%22numberOfUsb30Ports%22%2C%22operatingSystemEdition%22%2C%22ringRingSize%22%2C%22performanceHealthMonitoringFunctions%22%2C%22connectivityTechnology%22%2C%22dualSimCompatible%22%2C%22audioSource%22%2C%22totalNumberOfLabels%22%2C%22brushShape%22%2C%22maxProcessorClockSpeed%22%2C%22operatingHand%22%2C%22powerBatteryTechnology%22%2C%22travelRegion%22%2C%22capacityBinder%22%2C%22licenceValidityPeriod%22%2C%22storageHardDriveCapacity%22%2C%22spineSize%22%2C%22rollLength%22%2C%22numberOfRings%22%2C%22lightBulbType%22%2C%22colour%22%2C%222SidedCopying%22%2C%22automaticDocumentFeederCapacity%22%2C%22automaticPaperFeed%22%2C%22performanceShredderCutType%22%2C%22performanceBrightness%22%2C%22displayResolution%22%2C%22labellingOfficeUseFacet%22%2C%22securityLevel%22%2C%22maxSupportedDocumentSize%22%2C%22bulkbuyOnline%22%2C%22staplingCapacity%22%2C%22storageIncludedFlashMemory%22%2C%22compatibabilityCustomFitAndroid%22%2C%22drawerNumberOfDrawers%22%2C%22storageInternalMemorySize%22%2C%22ramInstalledSize%22%2C%22100RecycledProduct%22%2C%22placementPlacingMounting%22%2C%22earPlacement%22%2C%22foldedDimensions%22%2C%22portsTotalNumberOfNetworkingPorts%22%2C%22powerBatteryChargeAmpHours%22%2C%22noiseCancelling%22%2C%22surfaceShape%22%2C%22labellingHomeUseFacet%22%2C%22sizeDescription%22%2C%22maxLoadWeight%22%2C%22numberOfPowerPorts%22%2C%22compatibabilityCustomFitApple%22%2C%22tsaApproved%22%2C%22chassisType%22%2C%22surgeSuppression%22%2C%22printingTechnologyPrinters%22%2C%22placementVesaMountCompatibility%22%2C%22boardSizeFacet%22%2C%22frameStyle%22%2C%22serviceProvider%22%2C%22bluetoothCompatibility%22%2C%22scannerType%22%2C%22photoCapacityQuantity%22%2C%22numberOfUsb20Ports%22%2C%22rulingType%22%2C%22learningSkillsFocus%22%2C%22licenceType%22%2C%22connectivityDisplayConnections%22%2C%22performanceMaxThickness%22%2C%22performanceResolution%22%2C%22paperWeightGsm%22%2C%22numberOfProcessorCores%22%2C%22fitsDevice%22%2C%22brushhairtype%22%2C%22opticalZoom%22%2C%22processorClockSpeed%22%2C%22labellingIndustrialUseFacet%22%2C%22performanceApproximateNumberOfImpressions%22%2C%222SidedPrinting%22%2C%22powerPowerType%22%2C%22interfaceType%22%2C%22printerConnectivityTechnology%22%2C%22numberOfReamsPerCarton%22%2C%22baseWheels%22%2C%22performanceEstimatedCartridgeYieldSheets%22%2C%22papersize%22%2C%22processorType%22%2C%22wallStrengthThickness%22%2C%22storageHardDriveCapacityComputingDevices%22%2C%22ciewhiteness%22%2C%22runTime%22%2C%22stampInking%22%2C%22switched%22%2C%22processorManufacturer%22%2C%22deviceCaseCompatibility%22%2C%22caseFeaturesNumberOfCompartments%22%2C%22displaySize%22%2C%222sidedScanning%22%2C%22glutenFree%22%2C%22restTime%22%2C%22operatingPlatformCompatibility%22%2C%22powerSource%22%2C%22touchScreen%22%2C%22displayPanelType%22%2C%22secondaryProcessorType%22%2C%22wastebinCapacityRange%22%2C%22softwareDistributionMedia%22%2C%22learningAgeRange%22%2C%22tapeWidth%22%2C%22storageStorageCapacity%22%2C%22cableLength%22%2C%22skillLevel%22%2C%22flightTime%22%2C%22energyRating%22%2C%22maximumRecommendedDailyUsage%22%2C%22contentLayout%22%2C%22deviceLocation%22%2C%22brand%22%2C%22numberOfUsb31Ports%22%2C%22lidIncluded%22%2C%22scannerScanResolution%22%2C%22portsNumberOfUsbChargePorts%22%2C%22envelopeSize%22%2C%22keyboardCompatibility%22%2C%22primaryCameraVideo%22%2C%22supportedMemoryCards%22%2C%22connectivityDisplayConnectionsPanels%22%2C%22up1Category%22%2C%22price%22%2C%22categorySeoPaths%22%2C%22rangedRetail%22%2C%22rangedOnline%22%2C%22price%22%2C%22brand%22%2C%22colour%22%2C%22audioSource%22%2C%22cableLength%22%2C%22up1Category%22%2C%22bulkbuyOnline%22%2C%22microphoneType%22%2C%22noiseCancelling%22%2C%22bluetoothCompatibility%22%2C%22powerBatteryTechnology%22%2C%22smartHomeCompatibility%22%5D&tagFilters=&facetFilters=%5B%5B%22categorySeoPaths%3Atechnology%2Faudio-speakers%2Fvoice-assistant-speakers%22%5D%5D"}, {"indexName": "prod-product-wc-bestmatch-personal", "params": "query=&hitsPerPage=1&maxValuesPerFacet=10&page=0&highlightPreTag=%3Cais-highlight-0000000000%3E&highlightPostTag=%3C%2Fais-highlight-0000000000%3E&clickAnalytics=false&optionalFilters=%5B%5D&sumOrFiltersScores=true&filters=(categorySeoPaths%3A%22technology%2Faudio-speakers%2Fvoice-assistant-speakers%22)&attributesToRetrieve=%5B%5D&attributesToHighlight=%5B%5D&attributesToSnippet=%5B%5D&tagFilters=&analytics=false&facets=categorySeoPaths"}]} r = requests.post("https://k535caawve-dsn.algolia.net/1/indexes/*/queries?x-algolia-agent=Algolia%20for%20JavaScript%20(3.35.1)%3B%20Browser%20(lite)%3B%20react-instantsearch%205.4.0%3B%20JS%20Helper%202.26.1&x-algolia-application-id=K535CAAWVE&x-algolia-api-key=8a831febe0110932cfa06ff0e2024b4f", json=json).json() for item in r['results'][0]['hits']: print("Name: {:<65}, Url: {}".format( item['name'], f"https://www.officeworks.com.au/shop/officeworks/p/{item['urlKeyword']}"))

© www.soinside.com 2019 - 2024. All rights reserved.