我有一个谷歌查询,它显示了 8000 个带有链接的结果,我只想抓取搜索结果中的链接(url),我能够获取首页链接,有没有任何方法可以抓取下一页。这是我的代码
for page in range(0,7):
linkedin_urls = [url.text for url in linkedin_urls]
#print(linkedin_urls)
#loop to iterate through all links in the google search query
for gol_url in linkedin_urls:
print(gol_url)
#driver.get(Xmen_url)
#sel = Selector(text = driver.page_source)
sleep(3)
#Go back to google search
driver.get('https://www.gooogle.com')
sleep(3)
#locate search form by name
search_query = driver.find_element(By.NAME, 'q')
sleep(3)
#Input search words
search_query.send_keys('inurl:https://www.ama-assn.org/system/files')
#Simulate return key
search_query.send_keys(Keys.RETURN)
#find next page icon in Google search
#Next_Google_page = driver.find_element_by_link_text("Next").click()
Next_Google_page = driver.find_element(By.LINK_TEXT, "Next").click()
page += 1
谷歌搜索现在没有分页,而是无限滚动。您需要滚动到页面末尾并等待它自动加载更多结果,直到到达页面末尾,您必须单击“更多结果”才能查看更多结果。
这里是一个使用selenium滚动直到谷歌搜索结束的示例代码。
import time
from selenium import webdriver
search_query_link = 'google_search_query_link'
driver = webdriver.Chrome()
driver.get(search_query_link)
current_height = driver.execute_script("return document.body.scrollHeight")
page_end = True
while page_end:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
time.sleep(5)
new_height = driver.execute_script("return document.body.scrollHeight")
if current_height == new_height:
page_end = False
else:
current_height = new_height
# Your code to extract all the links goes here
driver.quit()
您可以进一步将此代码封装在循环中,以便每次遇到它时单击“更多结果”。