我有一个代码试图在字符串中找到匹配的模式。我想将这些模式存储在 json 文件中,因此我将这些 python 字符串转换为 json 格式。但是,它不起作用,我得到了不包含正确数量的值的错误匹配。
这就是我的 python 模式列表的样子:
patterns = [
r"^vol\.(\d+),no\.(\d+)\((\d+)\),p\.(\d+)-(\d+)$", #Vol. 12, no. 3, (2009), p. 118-120
r"^vol\.(\d+),no\.(\d+)\((\d+)\),p\.\[(\d+)-(\d+)\]$", #vol.1,no.7(2022),p.[1-6]
r"^vol\.(\d+),no\.(\d+)\((\d+)\),p\.(\d+-\d+)-(\d+-\d+)$", #vol.10,no.3(2018),p.03015-1-03015-4
r"^vol\.(\d+),no\.([\d-]+)\((\d+)\)p\.(\d+)-(\d+)$",
r"^vol(\d+),no\.([\d-]+)\((\d+)\),p\.(\d+)-(\d+)$",
r"^vol\.(\d+[a-z]?),no\.(\d+)\((\d+)\),p\.(\d+)-(\d+)$", #Vol. 35A, no. 3 (2004), p. 751-759
r"^vol\.(\d+),no\.([\d,]+)\((\d+)\),p\.(\d+)-(\d+)$", #Vol. 17, no. 7,8,9,10(2020), p. 573 - 582
r"^vol\.(\d+),no\.([\w]+)\((\d+)\),p\.(\d+)-(\d+)$" # vol.7,no.6C(2019),p.38-45
]
这就是我在 json 中的方式:
{
"patterns": [
"^vol\\.(\\d+),no\\.(\\d+)\\((\\d+)\\),p\\.(\\d+)-(\\d+)$",
"^vol\\.(\\d+),no\\.(\\d+)\\((\\d+)\\),p\\.\\[(\\d+)-(\\d+)\\]$",
"^vol\\.(\\d+),no\\.(\\d+)\\((\\d+)\\),p\\.(\\d+-\\d+)-(\\d+-\\d+)$",
"^vol\\.(\\d+),no\\.[\\d-]+\\((\\d+)\\)p\\.(\\d+)-(\\d+)$",
"^vol(\\d+),no\\.[\\d-]+\\((\\d+)\\),p\\.(\\d+)-(\\d+)$",
"^vol\\.(\\d+[a-z]?),no\\.(\\d+)\\((\\d+)\\),p\\.(\\d+)-(\\d+)$",
"^vol\\.(\\d+),no\\.[\\d,]+\\((\\d+)\\),p\\.(\\d+)-(\\d+)$",
"^vol\\.(\\d+),no\\.[\\w]+\\((\\d+)\\),p\\.(\\d+)-(\\d+)$"
]
}
这是我如何使用它们的示例,并帮助打印错误:
config_filename = "configuration\most_used_patterns_Vol_Iss_Year_Page.json"
config = load_config(config_filename)
if config is not None:
patterns = config.get('patterns', [])
for pattern in patterns:
match = re.match(pattern, row)
if match:
print(pattern)
print(match)
groups = match.groups()
print(groups[0])
print(groups[1])
print(groups[2])
print(groups[3])
volume, issue, year, start_page, end_page = match.groups()
输出:
^vol\.(\d+),no\.[\d,]+\((\d+)\),p\.(\d+)-(\d+)$
<re.Match object; span=(0, 29), match='vol.37,no.6,7(1987),p.370-377'>
37
1987
370
377
ValueError: not enough values to unpack (expected 5, got 4)
当我不从 json 文件中读取它们,而只是以我提到的形式将它们放入 python 中时,它就可以工作。有人可以向我解释一下这些 json 文件有什么问题吗?或者也许使用不同类型的文件来存储这些字符串模式是更好的选择。
看来你的 JSON 序列化出了问题。
最好使用 python 从输入数据创建 JSON:
patterns = [
r"^vol\.(\d+),no\.(\d+)\((\d+)\),p\.(\d+)-(\d+)$", #Vol. 12, no. 3, (2009), p. 118-120
r"^vol\.(\d+),no\.(\d+)\((\d+)\),p\.\[(\d+)-(\d+)\]$", #vol.1,no.7(2022),p.[1-6]
r"^vol\.(\d+),no\.(\d+)\((\d+)\),p\.(\d+-\d+)-(\d+-\d+)$", #vol.10,no.3(2018),p.03015-1-03015-4
r"^vol\.(\d+),no\.([\d-]+)\((\d+)\)p\.(\d+)-(\d+)$",
r"^vol(\d+),no\.([\d-]+)\((\d+)\),p\.(\d+)-(\d+)$",
r"^vol\.(\d+[a-z]?),no\.(\d+)\((\d+)\),p\.(\d+)-(\d+)$", #Vol. 35A, no. 3 (2004), p. 751-759
r"^vol\.(\d+),no\.([\d,]+)\((\d+)\),p\.(\d+)-(\d+)$", #Vol. 17, no. 7,8,9,10(2020), p. 573 - 582
r"^vol\.(\d+),no\.([\w]+)\((\d+)\),p\.(\d+)-(\d+)$" # vol.7,no.6C(2019),p.38-45
]
import json
with open("patterns.json", "w") as f:
json.dump({"patterns": patterns}, f, indent=2)
这将为您提供一个包含以下内容的正确 JSON 文件:
{
"patterns": [
"^vol\\.(\\d+),no\\.(\\d+)\\((\\d+)\\),p\\.(\\d+)-(\\d+)$",
"^vol\\.(\\d+),no\\.(\\d+)\\((\\d+)\\),p\\.\\[(\\d+)-(\\d+)\\]$",
"^vol\\.(\\d+),no\\.(\\d+)\\((\\d+)\\),p\\.(\\d+-\\d+)-(\\d+-\\d+)$",
"^vol\\.(\\d+),no\\.([\\d-]+)\\((\\d+)\\)p\\.(\\d+)-(\\d+)$",
"^vol(\\d+),no\\.([\\d-]+)\\((\\d+)\\),p\\.(\\d+)-(\\d+)$",
"^vol\\.(\\d+[a-z]?),no\\.(\\d+)\\((\\d+)\\),p\\.(\\d+)-(\\d+)$",
"^vol\\.(\\d+),no\\.([\\d,]+)\\((\\d+)\\),p\\.(\\d+)-(\\d+)$",
"^vol\\.(\\d+),no\\.([\\w]+)\\((\\d+)\\),p\\.(\\d+)-(\\d+)$"
]
}
将其读回 python 并将其与原始数据进行比较,应该证明它存储正确。