Selenium-为什么将driver.page_source的值写入文件时只能正确解析？

Question

使用Python，我想从reddit.com获取完整的HTML代码来搜索字符串，但是我只能得到一个奇怪的小版本。 does not下面的if语句中的代码运行，但是它应该运行，因为我知道该字符串存在于整个页面的源代码中（在浏览器开发工具和“查看页面源代码”浏览器功能中都找到了该字符串）：

driver = webdriver.Firefox()
driver.get('https://www.reddit.com')
driver.add_cookie({'name':'reddit_session', 'value':'###session cookie value goes here###', 'path':'/', 'domain':'reddit.com'})
driver.refresh() # refresh the page to apply the cookie
source_html = driver.page_source

if 'user account' in source_html:
  print("String found.")

driver.close()

Here is sourceHTML copy and pasted into a file. It is 65,536 bytes long and doesn't make sense.

起作用的是将变量内容写入文件：

sourceHTML

driver = webdriver.Firefox() driver.get('https://www.reddit.com') driver.add_cookie({'name':'reddit_session', 'value':'###session cookie value goes here###', 'path':'/', 'domain':'reddit.com'}) driver.refresh() # refresh the page to apply the cookie source_html = driver.page_source with open('page.html', 'w') as myfile: myfile.write(source_html) driver.close()

我需要能够在Python中搜索此HTML，而不必创建文件。

我尝试了以下但未成功的方法：

在调试时，复制从And here is the 580,000 bytes of HTML that I am expecting that was written to the file.返回的字符串，将其粘贴到记事本中并另存为.html文件。
使用BeautifulSoup解析driver.page_source变量。
执行Javascript以获取整个DOM：sourceHtml
使用getDOM = driver.execute_script('return document.documentElement.outerHTML')在运行time.sleep(5)之前等待页面完成加载（尽管Webdriver始终在等待完整响应，然后再调用）。

非常感谢。

Answer 1

我发现了问题。

driver.page_source

我的Copying a variable's value in debug mode does not give you the full value.语句中的字符串也应该是'U ser帐户'而不是'u ser帐户'，因此它的值为if。

当我通过调试模式将变量值保存为html文件时，这只是变量实际保留的一部分。这些调试变量值的限制在开发人员环境（Netbeans，Eclipse，VS Code等）之间有所不同。在VS Code中，似乎是65,536个字节（2 ^ 16）。

我独自留了下一个步骤，但是如果您确实想更改调试变量的大小限制，我认为您会向false文件中添加一个'key：value'（取决于您要调试的语言））。 launch.json。

This is how to do it for PHP文件在“启动”到调试模式时应用设置。该文件中有一个蓝色按钮，称为“添加配置...”。它将列出您可以应用于文件的所有设置。

Selenium-为什么将driver.page_source的值写入文件时只能正确解析？

问题描述投票：0回答：1

1个回答

最新问题

Selenium-为什么将driver.page_source的值写入文件时只能正确解析？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1