Selenium 找不到 reddit 评论的 ID，为什么？

Question

我一直在使用 selenium 来截取 Reddit 帖子和评论的屏幕截图，但遇到了一个无法在线修复的问题。我的代码为 selenium 提供了我想要截取屏幕截图的对象的 ID，并且与主要的 reddit 帖子本身一起，这非常有效。但当涉及到评论时，它总是超时（使用

EC.presence_of_element_located()

时）或说找不到它（使用

Driver.findElement()

时）。

这是代码：

def getScreenshotOfPost(header, ID, url):
    driver = webdriver.Chrome() #Using chrome to define a web driver
    driver.get(url) #Plugs the reddit url into the web driver
    driver.set_window_size(width=400, height=1600)
    wait = WebDriverWait(driver, 30)
    driver.execute_script("window.focus();")
    method = By.ID #ID is what I've found to be the most reliable method of look-up
    handle = f"{header}{ID}" #The header will be of the form "t3_" for posts and "t1_" for comments, and the ID is the ID of the post of comment.

    element = wait.until(EC.presence_of_element_located((method, handle)))
    driver.execute_script("window.focus();")

    fp = open(f'Post_{header}{ID}.png', "wb")
    fp.write(element.screenshot_as_png)
    fp.close()

我尝试过按 ID、CLASS、CSS_SELECTOR 和 XPATH 进行搜索，但都不起作用。我已经仔细检查过，无论 Reddit 帖子如何，表单

t1_{the id of the comment}

都是评论的正确 ID。增加网络驱动程序的等待时间不起作用。我不确定问题是什么。

预先感谢您的帮助！

Answer 1

我明白问题是什么了......页面上有大量嵌套的影子根。如果您熟悉 IFRAME，它们的行为类似。基本上，您需要将 Selenium 的上下文切换到 IFRAME/shadow-root 中，以便 Selenium 能够看到内部的 DOM 并继续。你必须切换到每个影子根，一次一个，并继续潜水，直到到达你想要的元素。

一些示例代码，

def test_recommended_code():
    driver = Chrome()

    driver.get('http://watir.com/examples/shadow_dom.html')

    shadow_host = driver.find_element(By.CSS_SELECTOR, '#shadow_host')
    shadow_root = shadow_host.shadow_root
    shadow_content = shadow_root.find_element(By.CSS_SELECTOR, '#shadow_content')

    assert shadow_content.text == 'some text'

    driver.quit()

您可以在这篇文章中阅读更多相关信息。

Selenium 找不到 reddit 评论的 ID，为什么？

问题描述投票：0回答：1

1个回答

最新问题

Selenium 找不到 reddit 评论的 ID，为什么？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1