我无法获取 Selenium 的 Youtube 评论

问题描述 投票:0回答:1

所以基本上我使用 Selenium 来抓取 Youtube 视频的评论。所以我需要获取作者姓名和他们的评论。但无论如何。我可以获取并打印出包含所有评论但不包含单个评论的元素。这就是我的:

wait = WebDriverWait(driver, 5)
driver.get("https://www.youtube.com/watch?v=vMtr0dE0jRo")
# Scroll to the bottom of the page to load comments
driver.execute_script("window.scrollTo(0, document.documentElement.scrollHeight);")

# You might need to adjust the range and sleep time depending on the number of comments
for _ in range(5):  # Adjust the range according to the number of comments
    driver.execute_script("window.scrollTo(0, document.documentElement.scrollHeight);")
    time.sleep(1)  # Adjust sleep time if necessary


for item in range(3):
    wait.until(EC.visibility_of_all_elements_located((By.TAG_NAME, "body")))
    time.sleep(2)

print("=====START CRAWLING DATA=====")
data = {};
comments = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#sections #contents")))
print(len(comments))
for comment in comments:
    # Author Name
    author_element = comment.find_element(By.CSS_SELECTOR, "#header-author h3")
    author_name = author_element.text.strip()  # Remove leading/trailing whitespace

    # Comment Text
    comment_text_element = comment.find_element(By.ID, "content-text")
    comment_text = comment_text_element.text.strip()
    
    # Print author name and comment text
    print("Author:", author_name)
    print("Comment:", comment_text)
    print()
print("Done")

但它得到了

NoSuchElementException                    Traceback (most recent call last)
Cell In[19], line 49
     28 # children of element
     29 # Function to find all children of an element recursively
     30 # def find_all_children(element):
   (...)
     45 #         find_all_children(child_element)
     46 # find_all_children(comment)
     47 for comment in comments:
     48     # Author Name
---> 49     author_element = comment.find_element(By.CSS_SELECTOR, "#header-author h3")
     50     author_name = author_element.text.strip()  # Remove leading/trailing whitespace
     52     # Comment Text

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\selenium\webdriver\remote\webelement.py:417, in WebElement.find_element(self, by, value)
    414     by = By.CSS_SELECTOR
    415     value = f'[name="{value}"]'
--> 417 return self._execute(Command.FIND_CHILD_ELEMENT, {"using": by, "value": value})["value"]

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\selenium\webdriver\remote\webelement.py:395, in WebElement._execute(self, command, params)
    393     params = {}
    394 params["id"] = self._id
--> 395 return self._parent.execute(command, params)
...
    (No symbol) [0x00007FF6388510C2]
    (No symbol) [0x00007FF638841914]
    BaseThreadInitThunk [0x00007FFDE4801FD7+23]
    RtlUserThreadStart [0x00007FFDE541D7D0+32]

所以我希望这个程序可以打印出作者姓名和评论,如下所示:

作者:@Ian21344 评论:不错!

作者:@Daved 评论:看起来不错

python selenium-webdriver youtube web-crawler
1个回答
0
投票

这一行:

comments = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#sections #contents")))

改变:

comments = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#contents #comment")))

节点选择器选择不准确

© www.soinside.com 2019 - 2024. All rights reserved.