如何获取html文档中字符的坐标？

问题描述投票：0回答：1

<span class = 'ocrx_word' id = 'word_1_45' title = 'bbox 369 429 301 123;x_wconf 96'>refrence</span>

如何使用python从上述代码中仅提取369429301123值？

python python-3.x web-scraping beautifulsoup python-tesseract

1个回答

0
投票

最简单的方法很可能是用分号分隔文本以获取之前的所有内容。然后，您可以再次拆分，只保留数字部分。

from bs4 import BeautifulSoup

tag = "<span class = 'ocrx_word' id = 'word_1_45' title = 'bbox 369 429 301 123;x_wconf 96'>refrence</span>"
soup = BeautifulSoup(tag, 'html.parser')
s = soup.findAll('span')

for span in s:
    print([x  for x in span.attrs['title'].split(';')[0].split() if x.isdigit()])

最新问题

© www.soinside.com 2019 - 2024. All rights reserved.