谷歌地图评论用硒爬行

问题描述 投票:0回答:1

我是爬行初学者。我有个问题。这是我的代码和爬行成功但没有“更多”。

这是我的代码

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import time

# Specify the URL of the business page on Google Maps
url = 'https://www.google.com/maps/place/Henn+na+Hotel+Tokyo+Asakusa+Tawaramachi/@35.7081735,139.7865534,17z/data=!3m1!5s0x60188eb8e5155075:0x3c1d343c96398eb6!4m11!3m10!1s0x60188f36ab21f05b:0x9241dab287ff62c9!5m2!4m1!1i2!8m2!3d35.7081692!4d139.7914243!9m1!1b1!16s%2Fg%2F11h0gzlhht?hl=en&entry=ttu'

# Create an instance of the Chrome driver
driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()))

# Navigate to the specified URL
driver.get(url)

# Wait for the reviews to load
wait = WebDriverWait(driver, 20)  # Increased the waiting time
# Scroll down to load more reviews
body = driver.find_element(By.XPATH, "//div[contains(@class, 'm6QErb') and contains(@class, 'DxyBCb') and contains(@class, 'kA9KIf') and contains(@class, 'dS8AEf')]")
num_reviews = len(driver.find_elements(By.CLASS_NAME, 'wiI7pd'))
while True:
    body.send_keys(Keys.END)
    time.sleep(2)  # Adjust the delay based on your internet speed and page loading time
    new_num_reviews = len(driver.find_elements(By.CLASS_NAME, 'wiI7pd'))
    if new_num_reviews == num_reviews:
        # Scroll to the top to ensure all reviews are loaded
        body.send_keys(Keys.HOME)
        time.sleep(2)
        break
    num_reviews = new_num_reviews

# Wait for the reviews to load completely
wait.until(EC.presence_of_all_elements_located((By.CLASS_NAME, 'wiI7pd')))

# Extract the text of each review
review_elements = driver.find_elements(By.CLASS_NAME, 'wiI7pd')
reviews = [element.text for element in review_elements]

# Print the reviews
print(reviews)

# Close the browser
driver.quit()

结果示例

"Thank you for staying at the Henn na Hotel Tokyo Asakusa-Tawaramachi. \nWe appreciate that you took the time 
to input your valuable feedback.\nWe are sorry that you were not satisfied with the breakfast. We will share your …
More',"

如何获取全部数据?我的意思是想在向下滚动时获取“更多”中的数据。

所以我尝试自己解决问题

more = driver.find_element(By.XPATH, '//*[@id="ChdDSUhNMG9nS0VJQ0FnSUNsc29ITjJnRRAB"]/span[2]/button').click()

我得到了更多按钮的路径并编写了代码,但它不起作用。

感谢您的阅读并等待答案:)

python selenium-webdriver web-crawler
1个回答
0
投票

因此您希望能够单击评论部分中的“更多”按钮并提取每条评论的文本。由于有多个“更多”按钮,我建议创建一个循环,以便它可以一次查看一个数据,然后移至下一个并重复该过程,同时每次附加数据。为此,我将使用以下代码:

more_buttons = driver.find_element(By.Tag_name, 'button') #这将获取页面上的所有按钮。

for button in more_buttons: #this will go through each of the buttons and 
open them exposing the text
    if button.text == "more": #this will search the text of each button to find a match
    button.click() #this will click on the element

从此所有文本都将可见并且可以提取。

© www.soinside.com 2019 - 2024. All rights reserved.