我在写爬虫的代码
我的目标:从网站获取数据
我的烦恼: 我写了一个迭代器来访问网站的不同页面,就像 https://www.kroger.com/pl/hair-care/21002?taxonomyId=21002&page=2&fulfillment=ais 和 https://www.kroger。 com/pl/hair-care/21002?taxonomyId=21002&page=3&fulfillment=ais。 但是我得到了相同的数据。
def download_with_wait(self, url: str, wait_elem_id: Optional[str] = None,
callback: Optional[typing.Callable] = None):
logger.info(f"Fetching {url}")
self.driver.get(url)
if self.browser == self.Browser.CHROME:
logs = self.driver.get_log("performance")
http_status_code = self.get_status(logs)
if http_status_code is not None and http_status_code >= 400:
logger.warning(f"Failed to fetch {url} with status code: {http_status_code}")
return None
if callback is not None:
callback(self.driver)
logger.info("Waiting for page to load")
if wait_elem_id is not None:
timeout = 60
try:
element_present = ec.presence_of_element_located((By.ID, wait_elem_id))
WebDriverWait(self.driver, timeout).until(element_present)
except TimeoutException:
logger.warning("Timed out waiting for page to load")
return None
else:
sleep(3)
inner_html = self.driver.page_source
# self.driver.save_screenshot("ss.png")
return str(inner_html).encode("utf-8")
我尝试通过谷歌浏览器访问这两个网址,我发现无论我输入这两个网址中的哪一个,我看到的真实网站是“https://www.kroger.com/pl/hair-care/21002 ?taxonomyId=21002&page=3&fulfillment=ais”,其中没有字符串“page=n”。