我正在尝试解析某些数据,其格式如下,称为data
:
data = '(def-instance Adelphi
(expenses thous$:7-10)
(academic-emphasis biology))
(def-instance Arizona-State
(expenses thous$:4-7)
(academic-emphasis fine-arts))'
我想将数据拆分为一个列表,以使第一段位于第一项中,第二段位于第二项中,即:
['(def-instance Adelphi
(expenses thous$:7-10)
(academic-emphasis business-administration)
(academic-emphasis biology))',
'(def-instance Arizona-State
(expenses thous$:4-7)
(academic-emphasis fine-arts)']
我尝试使用命令re.split(r'\(*(\([^()]*\)*)*\)',data)
,但是我略微掉线了,我不知道为什么。帮助将不胜感激。
您可以通过遍历数据,搜索))
并根据找到的索引和值创建结果列表来实现此目的。
data = data.split('\n')
result = list()
prev = 0
for idx, value in enumerate(data):
if '))' in value:
result.append('\n'.join(data[prev:idx + 1]))
prev = idx + 1
此输出:
print(result)
#['(def-instance Adelphi\n (state newyork)\n (control private)\n (no-of-students thous:5-10)\n (male:female ratio:30:70)\n (student:faculty ratio:15:1)\n (sat verbal 500)\n (sat math 475)\n (expenses thous$:7-10)\n (percent-financial-aid 60)\n (no-applicants thous:4-7)\n (percent-admittance 70)\n (percent-enrolled 40)\n (academics scale:1-5 2)\n (social scale:1-5 2)\n (quality-of-life scale:1-5 2)\n (academic-emphasis business-administration)\n (academic-emphasis biology))', '(def-instance Arizona-State\n (state arizona)\n (control state)\n (no-of-students thous:20+)\n (male:female ratio:50:50)\n (student:faculty ratio:20:1)\n (sat verbal 450)\n (sat math 500)\n (expenses thous$:4-7)\n (percent-financial-aid 50)\n (no-applicants thous:17+)\n (percent-admittance 80)\n (percent-enrolled 60)\n (academics scale:1-5 3)\n (social scale:1-5 4)\n (quality-of-life scale:1-5 5)\n (academic-emphasis business-education)\n (academic-emphasis engineering)\n (academic-emphasis accounting)\n (academic-emphasis fine-arts))']