如何从python上的html标签获取某些文本?

问题描述 投票:2回答:3

我正在从API制作Python md5解密器,但问题是API正在发回HTML反馈。如何获取<font color=green>之间的文本?

{"error":0,"msg":"<font color=blue><b>Live</b></font><font color=green>Jumpman#23</font> | [MD5 Decrypt] .S/C0D3"}
python
3个回答
2
投票

我建议使用HTML解析器作为Beautiful Soup

>>> from bs4 import BeautifulSoup
>>> d = {"error":0,"msg":"<font color=blue><b>Live</b></font><font color=green>Jumpman#23</font> | [MD5 Decrypt] .S/C0D3"}
>>> soup = BeautifulSoup(d['msg'], 'html.parser')
>>> soup.font.attrs
{'color': 'blue'}

您将获得一个包含key,value pars作为属性名称,值的dict。

Update

获取文本"Jumpman#23"

>>> soup.findAll("font", {"color": "green"})[0].contents[0]
'Jumpman#23'

0
投票

如果您知道目标文本将完全是<font color=green>,那么您可以使用简单的字符串操作:

msg = "<font color=blue><b>Live</b></font><font color=green>Jumpman#23</font> | [MD5 Decrypt] .S/C0D3"
start_pattern = "<font color=green>"
stop_pattern = "<"
start_index = msg.find(start_pattern) + len(start_pattern)
stop_index = start_index + msg[start_index:].find(stop_pattern)
print msg[start_index:stop_index]

0
投票

您可以使用bs4和相邻的兄弟组合器进行字体标记

from bs4 import BeautifulSoup as bs
s = {"error":0,"msg":"<font color=blue><b>Live</b></font><font color=green>Jumpman#23</font> | [MD5 Decrypt] .S/C0D3"}
soup = bs(s['msg'], 'lxml')
data =  soup.select_one('font + font').text
print(data)
© www.soinside.com 2019 - 2024. All rights reserved.