python 字符串由逗号分割，仅出现在两个特定字符之间 ><

Question

Answer 1

我决定使用以下方法....不知道最有效但它有效。

test5 = string.replace(">, <", ">|<")
options = test5.split("|")

这种方法不需要在适当的位置设置 html 字符串

Answer 2

通常我不会建议在与 XML/HTML 相关的任何事情上使用正则表达式，但是由于您输入的是一些经过处理的形式并且不再有效，我会说在这种情况下使用正则表达式是可以接受的，如果您无法修复它在数据源：

import re

s = '<div class="options mceEditable">The membrane is a dynamic structure, and its constituents are in constant movement.</div>, <div class="options mceEditable">The lipids component of the membrane constitutes a bilayer of hydrophilic ends</div>, <div class="options mceEditable">The lipid content of the membrane is more than that of the protein</div>, <div class="options mceEditable">The proteins may either be carriers or receptors only</div>, <div class="options mceEditable">It is a 3-layered lipid structure</div>'  

pattern = r'<div class="options mceEditable">.*?<\/div>'

matches = re.findall(pattern, s, re.U)
for m in matches:
    print(m)

输出：

<div class="options mceEditable">The membrane is a dynamic structure, and its constituents are in constant movement.</div>
<div class="options mceEditable">The lipids component of the membrane constitutes a bilayer of hydrophilic ends</div>
<div class="options mceEditable">The lipid content of the membrane is more than that of the protein</div>
<div class="options mceEditable">The proteins may either be carriers or receptors only</div>
<div class="options mceEditable">It is a 3-layered lipid structure</div>

Answer 3

可以用beautifulsoup

# pip install bs4
import bs4

soup = bs4.BeautifulSoup(s)
divs = soup.find_all('div')

输出：

>>> divs
[<div class="options mceEditable">The membrane is a dynamic structure, and its constituents are in constant movement.</div>,
 <div class="options mceEditable">The lipids component of the membrane constitutes a bilayer of hydrophilic ends</div>,
 <div class="options mceEditable">The lipid content of the membrane is more than that of the protein</div>,
 <div class="options mceEditable">The proteins may either be carriers or receptors only</div>,
 <div class="options mceEditable">It is a 3-layered lipid structure</div>]

python 字符串由逗号分割，仅出现在两个特定字符之间 ><

问题描述投票：0回答：3

3个回答

最新问题

python 字符串由逗号分割，仅出现在两个特定字符之间 ><

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3