我希望得到一个扁平的字符串公式并将其拆分成一个数组,根据几个因素进行划分。在括号内稍微卡住并寻求帮助。
我一直在使用正则表达式扫描加上一些过滤器来尝试获得生成的数组。
我目前的测试是这样的:
describe 'split algorithm' do
it 'can split a flat algorithm' do
algo = 'ABC * DEF * GHI Round(3) = JKL * MNO * PQR Round(0) = SAVE'
actual = split_algo(algo)
expected = ['ABC', '* DEF', '* GHI', 'Round(3)', '= JKL', '* MNO', '* PQR', 'Round(0)', '= SAVE']
expect(actual).to eq expected
end
it 'can split an algorithm with parenthesis' do
algo = '(ABC + DEF + (GHI * JKL)) - ((MNO + PQR + (STU * VWX)) * YZ) Round(0) + SUM(AAA) = SAVE'
actual = split_algo(algo)
expected = ['(', 'ABC', '+ DEF', '+', '(', 'GHI', '* JKL', ')', ')', '-', '(', '(', 'MNO', '+ PQR', '+', '(', 'STU', '* VWX', ')', ')', '* YZ', ')', 'Round(0)', '+ SUM', '(', 'AAA', ')', '= SAVE']
expect(actual).to eq expected
end
end
使用以下代码,我可以让上半部分正常通过:
def split_algo(algorithm)
pattern = /(?:(\ (\*\ |\+\ |\-\ |\\\ |\=\ )\S*))|(\S*)/
matches = algorithm.scan(pattern)
matches.each_with_index { |match, index| matches[index]=match.compact }
arr = []
matches.each do |match|
arr << match.max_by(&:length).strip
end
arr.delete('')
arr
end
我已经尝试修改pattern
以接受括号匹配器:
pattern = (\(|\))|(?:(\ (\*\ |\+\ |\-\ |\\\ |\=\ )\S*))|(\S*)
但这只捕获公式开头的括号。
我们可以定义以下正则表达式。
R = /
# split after an open paren if not followed by a digit
(?<=\() # match is preceded by an open paren, pos lookbehind
(?!\d) # match is not followed by a digit, neg lookahead
[ ]* # match >= 0 spaces
| # or
# split before an open paren if paren not followed by a digit
(?= # begin pos lookahead
\( # match a left paren...
(?!\d) # ...not followed by a digit, neg lookahead
) # end pos lookahead
[ ]* # match >= 0 spaces
| # or
# split before a closed paren if paren not preceded by a digit
(?<!\d) # do not follow a digit, neg lookbehind
(?=\)) # match a closed paren, pos lookahead
[ ]* # match >= 0 spaces
| # or
# split after a closed paren
(?<=\)) # match a preceding closed paren, pos lookbehind
[ ]* # match >= 0 spaces
| # or
# match spaces not preceded by *, = or + and followed by a letter
(?<![*=+\/-]) # match is not preceded by one of '*=+\/-', neg lookbehind
[ ]+ # match one or more spaces
| # or
# match spaces followed by a letter
[ ]+ # match one or more spaces
(?=\() # match a left paren, pos lookahead
/x # free-spacing regex definition mode
在第一个例子中,我们有以下内容。
algo1 = 'ABC * DEF * GHI Round(3) = JKL * MNO * PQR Round(0) = SAVE'
expected1 = ['ABC', '* DEF', '* GHI', 'Round(3)', '= JKL', '* MNO',
'* PQR', 'Round(0)', '= SAVE']
algo1.split(R) == expected1
#=> true
在第二个例子中,我们有以下内容。
algo2 = '(ABC + DEF + (GHI * JKL)) - ((MNO + PQR + (STU * VWX)) * YZ) Round(0) + SUM(AAA) = SAVE'
expected2 = ['(', 'ABC', '+ DEF', '+', '(', 'GHI', '* JKL', ')', ')', '-',
'(', '(', 'MNO', '+ PQR', '+', '(', 'STU', '* VWX', ')', ')',
'* YZ', ')', 'Round(0)', '+ SUM', '(', 'AAA', ')', '= SAVE']
algo2.split(R) == expected2
#=> true
正则表达式通常如下编写。
R = /(?<=\()(?!\d) *|(?=\((?!\d)) *|(?<!\d)(?=\)) *|(?<=\)) *|(?<![*=+\/-]) +| +(?=\()/
在自由间隔模式中,我在字符类([ ]
)中包含空格;否则在评估表达式之前它们会被剥离。当常规编写正则表达式时,这不是必需的。
我做了以下似乎工作:
在split_paren(arr)
末尾添加了对新方法split_algo
的调用。
def split_paren(algo_arr)
pattern = /Round\(\d*\)/
arr = []
algo_arr.each do |step|
f = step.split(/(\(|\))/) unless step =~ pattern
f.delete('') if f.class == Array
f.nil? ? arr << step : f.each{|s| arr << s.strip}
end
arr
end
如果有人想以更好的方式做出回应,请随时回复。否则我会接受我的回答,并在这里稍微关闭一下这个问题。