Python中带有双括号的正则表达式

问题描述 投票:0回答:1

我正在尝试解析某些数据,其格式如下,称为data

data = '(def-instance Adelphi
   (expenses thous$:7-10)
   (academic-emphasis biology))
(def-instance Arizona-State
   (expenses thous$:4-7)
   (academic-emphasis fine-arts))' 

我想将数据拆分为一个列表,以使第一段位于第一项中,第二段位于第二项中,即:

['(def-instance Adelphi
   (expenses thous$:7-10)
   (academic-emphasis business-administration)
   (academic-emphasis biology))', 
'(def-instance Arizona-State
   (expenses thous$:4-7)
   (academic-emphasis fine-arts)']

我尝试使用命令re.split(r'\(*(\([^()]*\)*)*\)',data),但是我略微掉线了,我不知道为什么。帮助将不胜感激。

python regex split
1个回答
0
投票

您可以通过遍历数据,搜索))并根据找到的索引和值创建结果列表来实现此目的。

data = data.split('\n')

result = list()
prev = 0

for idx, value in enumerate(data):
    if '))' in value:
        result.append('\n'.join(data[prev:idx + 1]))
        prev = idx + 1

此输出:

print(result)
#['(def-instance Adelphi\n   (state newyork)\n   (control private)\n   (no-of-students thous:5-10)\n   (male:female ratio:30:70)\n   (student:faculty ratio:15:1)\n   (sat verbal 500)\n   (sat math 475)\n   (expenses thous$:7-10)\n   (percent-financial-aid 60)\n   (no-applicants thous:4-7)\n   (percent-admittance 70)\n   (percent-enrolled 40)\n   (academics scale:1-5 2)\n   (social scale:1-5 2)\n   (quality-of-life scale:1-5 2)\n   (academic-emphasis business-administration)\n   (academic-emphasis biology))', '(def-instance Arizona-State\n   (state arizona)\n   (control state)\n   (no-of-students thous:20+)\n   (male:female ratio:50:50)\n   (student:faculty ratio:20:1)\n   (sat verbal 450)\n   (sat math 500)\n   (expenses thous$:4-7)\n   (percent-financial-aid 50)\n   (no-applicants thous:17+)\n   (percent-admittance 80)\n   (percent-enrolled 60)\n   (academics scale:1-5 3)\n   (social scale:1-5 4)\n   (quality-of-life scale:1-5 5)\n   (academic-emphasis business-education)\n   (academic-emphasis engineering)\n   (academic-emphasis accounting)\n   (academic-emphasis fine-arts))']
© www.soinside.com 2019 - 2024. All rights reserved.