python 字符串由逗号分割,仅出现在两个特定字符之间 ><

问题描述 投票:0回答:3
python string split
3个回答
0
投票

我决定使用以下方法....不知道最有效但它有效。

test5 = string.replace(">, <", ">|<")
options = test5.split("|")

这种方法不需要在适当的位置设置 html 字符串


0
投票

通常我不会建议在与 XML/HTML 相关的任何事情上使用正则表达式,但是由于您输入的是一些经过处理的形式并且不再有效,我会说在这种情况下使用正则表达式是可以接受的,如果您无法修复它在数据源:

import re

s = '<div class="options mceEditable">The membrane is a dynamic structure, and its constituents are in constant movement.</div>, <div class="options mceEditable">The lipids component of the membrane constitutes a bilayer of hydrophilic ends</div>, <div class="options mceEditable">The lipid content of the membrane is more than that of the protein</div>, <div class="options mceEditable">The proteins may either be carriers or receptors only</div>, <div class="options mceEditable">It is a 3-layered lipid structure</div>'  

pattern = r'<div class="options mceEditable">.*?<\/div>'

matches = re.findall(pattern, s, re.U)
for m in matches:
    print(m)

输出:

<div class="options mceEditable">The membrane is a dynamic structure, and its constituents are in constant movement.</div>
<div class="options mceEditable">The lipids component of the membrane constitutes a bilayer of hydrophilic ends</div>
<div class="options mceEditable">The lipid content of the membrane is more than that of the protein</div>
<div class="options mceEditable">The proteins may either be carriers or receptors only</div>
<div class="options mceEditable">It is a 3-layered lipid structure</div>

0
投票

可以用beautifulsoup

# pip install bs4
import bs4

soup = bs4.BeautifulSoup(s)
divs = soup.find_all('div')

输出:

>>> divs
[<div class="options mceEditable">The membrane is a dynamic structure, and its constituents are in constant movement.</div>,
 <div class="options mceEditable">The lipids component of the membrane constitutes a bilayer of hydrophilic ends</div>,
 <div class="options mceEditable">The lipid content of the membrane is more than that of the protein</div>,
 <div class="options mceEditable">The proteins may either be carriers or receptors only</div>,
 <div class="options mceEditable">It is a 3-layered lipid structure</div>]
© www.soinside.com 2019 - 2024. All rights reserved.