如何提取HTML文件示例https://patents.google.com/patent/EP1208209A1/en?oq=medicinal+chemistry的特定部分
到目前为止,我使用beautifulsoup来获取没有所有标签的html文本版本。但是我想让我的代码只说上述文件的声明部分。
据我所知,有两个div,其类为“ flex flex-width style-scope patent-result”。
soup = BeautifulSoup(sdata)
mydivs = soup.findAll("div", {"class": "flex flex-width style-scope patent-result"})
div_with_claims = mydivs [1]
filename= 'C:/Users/xyz/.ipynb_checkpoints/EP1208209A1.html'
html_file =open(filename, 'r', encoding='utf-8')
source_code = html_file.read()
#print(source_code)
soup = BeautifulSoup(source_code)
print(soup.get_text())
#mydivs = soup.findAll("div", {"class": "flex flex-width style-scope patent-result"})
#div_with_claims = mydivs [1]
#print(div_with_claims)