我想从文本中提取合同长度到期限(以月为单位)。自由文本字段的范围包括:
"2 x 5 year terms",
"3 further x 4 years",
"two(2) further terms of five(5) years each",
"Two (2) Years + Two (2) Years + Two (2) Years",
"1 years + 1 years + 1 years" ,
"2 x 3 years",
"1 year and 6 months",
"
我希望输出为:
120 months,
144 months,
120 months,
72 months,
36 months
72 months
18 months
import re
def calculate_duration(term):
term = term.lower()
# Handle "x year terms" pattern
match = re.match(r'(\d+) x (\d+) year terms?', term)
if match:
return int(match.group(1)) * int(match.group(2)) * 12
# Handle "FURTHER TERMS OF x YEARS EACH" pattern
match = re.match(r'further terms of (\d+) years each', term)
if match:
return int(match.group(1)) * 12
# Handle "FURTHER TERMS OF x YEARS EACH" pattern
match = re.match(r'further terms of (\d+) years each', term)
if match:
return int(match.group(1)) * 12
# Handle "FURTHER TERMS OF x YEARS EACH" pattern
match = re.match(r'further terms of ((?:\d+\s?\(\w+\)\s?)?(\d+)) years each', term)
if match:
return int(match.group(2)) * 12
# Handle "x years + x years + x years" pattern
match = re.match(r'(\d+) years(\s?\+\s?\d+ years)+', term)
if match:
return sum(int(match.group(1)) for group in match.groups()) * 12
# Handle other patterns or simple year counts
match = re.match(r'(\d+) years?', term)
if match:
return int(match.group(1)) * 12
# Handle other cases or unknown patterns
return None
# Example usage
terms = [
"2 x 5 year terms",
"3 further x 4 YEAR terms",
"Two (2) Years + Two (2) Years + Two (2) Years",
"1 years + 1 years + 1 years" ,
"2 x 3 years"
]
for term in terms:
duration = calculate_duration(term)
print(f"{term}: {duration} months")
“...我想从文本中提取合同长度到期限(以月为单位)。...”
利用 eval 内置函数。
遍历文本,附加相应的值;数字和运算符。
当遇到“年份”值时,相应调整之前的值;乘以12。
从这里,通过连接值生成数学表达式。
这是一个例子。
import re
def parse(s: str):
e = []
for i, x in enumerate(s.split()):
if any([c.isdigit() for c in x]):
e.append(int(re.sub(r'\D', '', x)))
elif 'year' in x.lower(): e[-1] *= 12
elif x in ['x', 'of']: e.append('*')
elif x in ['+', 'and']: e.append('+')
return e
text = ['2 x 5 year terms',
'3 further x 4 years',
'two(2) further terms of five(5) years each',
'Two (2) Years + Two (2) Years + Two (2) Years',
'1 years + 1 years + 1 years',
'2 x 3 years',
'1 year and 6 months']
for string in text:
exp = ' '.join(map(str, parse(string)))
print(exp, '=', eval(exp))
输出
2 * 60 = 120
3 * 48 = 144
2 * 60 = 120
24 + 24 + 24 = 72
12 + 12 + 12 = 36
2 * 36 = 72
12 + 6 = 18