获取所有组匹配项[重复项]

Question

此问题已经在这里有了答案：

Capturing repeating subpatterns in Python regex4个答案

我正在尝试访问组中的所有匹配项。这是我使用的正则表达式示例：

re.compile('([a-zA-Z0-9]+)@([a-zA-Z]+)(?P<domain>\.[a-zA-Z]+)*')

我有一个名为“域”的小组，我正在尝试访问该小组的所有比赛。例如，如果我有字符串：

"[email protected]"

我想要结果：

[".subdomain",".more",".subdomain",".domain",".org"]

我无法找到一种方法来访问它。

Answer 1

我可能在这里使用re.match首先断言输入匹配您的模式。然后，使用re.findall查找所需的组件：

inp = "[email protected]"
if re.match(r'[a-zA-Z0-9]+@[a-zA-Z]+(?:\.[a-zA-Z]+)*', inp):
    inp = inp[inp.index('@')+1:]
    parts = re.findall(r'\.[a-zA-Z]+', inp)
    print(parts)

此打印：

['.subdomain', '.more', '.subdomain', '.domain', '.org']

Answer 2

string = '[email protected]'
temp3 = re.findall(r'[.]\w+', string)
res2 = list(temp3)
print (res2)

请检查是否有帮助

Answer 3

好的。这是一个完整的答案，总结了我对其他答案的所有评论和建议：

import re

s='[email protected]'
tokens = re.findall(r'[.]\w+', s.split('@')[1])
print(tokens)

这是概念证明：

Python 3.7.4 (default, Aug 12 2019, 14:45:07) 
[GCC 9.1.1 20190605 (Red Hat 9.1.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> 
>>> s='[email protected]'
>>> tokens = re.findall(r'[.]\w+', s.split('@')[1])
>>> print(tokens)
['.subdomain', '.more', '.subdomain', '.domain', '.01', '.org']
>>>

如果要从名称中删除点，请改用此版本：

import re

s='[email protected]'
tokens = re.findall(r'[.](\w+)', s.split('@')[1])
print(tokens)

产生此输出：

>>> tokens = re.findall(r'[.](\w+)', s.split('@')[1])
>>> print(tokens)
['subdomain', 'more', 'subdomain', 'domain', '01', 'org']
>>>

作为最后的建议，使用\w+的一个优点是它也涵盖了域名中的国际字符：

>>> s='hello.world@www.日本.com'
>>> tokens = re.findall(r'[.](\w+)', s.split('@')[1])
>>> print(tokens)
['日本', 'com']
>>>

Answer 4

string = '[email protected]'
string = string.split('@')
string = (string[1])             
string3 = re.findall(r'[.]\w+', string)
print (string3)

希望这会有所帮助

获取所有组匹配项[重复项]

问题描述投票：1回答：4

4个回答

最新问题

获取所有组匹配项[重复项]

问题描述 投票：1回答：4

4个回答

最新问题

问题描述投票：1回答：4