无法使用Beautifulsoup读取网页的所有html

问题描述投票：0回答：1

我正在尝试使用Beautifulsoup从SEC提取10k表格。不幸的是，以下代码并未显示所有html。它是从html中间的某处开始打印的。但是，将其应用于我尝试过的其他几个网页时，效果很好。任何帮助都感激不尽。我是python编码的新手，我希望能学到更多，因为它开始对我发展：）

import urllib.request, urllib.error
from bs4 import BeautifulSoup
import ssl

ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

url = "https://www.sec.gov/Archives/edgar/data/920148/000092014820000011/lh10-k2019.htm"
html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, "html.parser")
print(soup.prettify().encode("utf-8"))

python html parsing beautifulsoup html-parsing

1个回答

0
投票

可能发生的情况是，您的终端中没有足够的空间，因此您看到的只是其中一部分，而实际上整个页面都在那里。我猜工作的页面要短得多。

最新问题

© www.soinside.com 2019 - 2024. All rights reserved.