给定类定义的代码,我试图提取所有属性及其注释(
""
如果没有注释则为空字符串)。
class Player(Schema):
score = fields.Float()
"""
Total points from killing zombies and finding treasures
"""
name = fields.String()
age = fields.Int()
backpack = fields.Nested(
PlayerBackpackInventoryItem,
missing=[PlayerBackpackInventoryItem.from_name("knife")],
)
"""
Collection of items that a player can store in their backpack
"""
在上面的例子中,我们期望解析的结果是:
[
("score", "Total points from killing zombies and finding treasures"),
("name", ""),
("age", ""),
("backpack", "Collection of items that a player can store in their backpack")
]
在我下面的尝试中,无法正确提取注释,给出输出:
[
('score', 'Total points from killing zombies and finding treasures'),
('name', ''),
('age', ''),
('backpack', '')
]
如何修复正则表达式(甚至整个解析逻辑)来处理示例类代码中出现的情况?
谢谢
import re
code_block = '''class Player(Schema):
score = fields.Float()
"""
Total points from killing zombies and finding treasures
"""
name = fields.String()
age = fields.Int()
backpack = fields.Nested(
PlayerBackpackInventoryItem,
missing=[PlayerBackpackInventoryItem.from_name("knife")],
)
"""
Collection of items that a player can store in their backpack
"""
'''
def parse_schema_comments(code):
# Regular expression pattern to match field names and multiline comments
pattern = r'(\w+)\s*=\s*fields\.\w+\([^\)]*\)(?:\n\s*"""\n(.*?)\n\s*""")?'
# Find all matches using the pattern
matches = re.findall(pattern, code, re.DOTALL)
# Process the matches to format them as required
result = []
for match in matches:
field_name, comment = match
comment = comment.strip() if comment else ""
result.append((field_name, comment))
return result
parsed_comments = parse_schema_comments(code_block)
print(parsed_comments)
pattern = r'(\w+)\s*=\s*fields\.\w+\(.*?\)\n(?:\s*"""\s*([\s\S]*?)\s*"""\s*)?'
更新模式会产生以下输出:
[('score', 'Total points from killing zombies and finding treasures'),
('name', ''),
('age', ''),
('backpack', 'Collection of items that a player can store in their backpack')]