findall返回整个正则表达式匹配作为其第一个索引，即使其中存在一个组[重复]

Question

这个问题在这里已有答案：

re.findall behaves weird 2回答

我使用正则表达式对象使用findall方法，但我得到了我的字符串的整个表达式匹配，虽然我有一个组存在于其中。

我正在使用python 3.7.3

import re
def emailfinder(spam):
   emailregx=re.compile(r'''(
   [a-zA-Z0-9%_+-.]+
   @
   [a-zA-Z0-9.-]+
   (\.[a-zA-Z]{2,4})
   )''',re.VERBOSE)
   return emailregx.findall(spam)
print(emailfinder('[email protected] blah monkey [email protected]'))

输出是[('[email protected]', '.com'), ('[email protected]', '.in')]。但我期待它是['.com','.in']

Answer 1

你有多余的括号，产生两组。修复它的工作原理：

import re
def emailfinder(spam):
   emailregx=re.compile(r'''
   [a-zA-Z0-9%_+-.]+
   @
   [a-zA-Z0-9.-]+
   (\.[a-zA-Z]{2,4}
   )''',re.VERBOSE)
   return emailregx.findall(spam)

print(emailfinder('[email protected] blah monkey [email protected]'))
['.com', '.in']

Answer 2

在re中分组意味着您只想捕获那些部分。您已将分组放在错误的位置。

Python 3.7.1 (default, Dec 10 2018, 22:54:23) [MSC v.1915 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> pattern = re.compile(r'[a-zA-Z0-9%_+-.]+\@[a-zA-Z0-9.-]+(\.[a-zA-Z]{2,4})')
>>> all = pattern.findall('[email protected] blah monkey [email protected]')
>>> print(all)
['.com', '.in']
>>>

findall返回整个正则表达式匹配作为其第一个索引，即使其中存在一个组[重复]

问题描述投票：0回答：2

2个回答

最新问题

findall返回整个正则表达式匹配作为其第一个索引，即使其中存在一个组[重复]

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2