用python替换不同长度的数字(re.sub)

问题描述 投票:0回答:2
corpus = """In the US 555-0198 and 1-206-5705-0100 are examples fictitious numbers.
            In the UK, 044-113-496-1834 is a fictitious number.
            In Ireland, the number 353-020-917-1234 is fictitious.
            And in Australia, 061-970-654-321 is a fictitious number.
            311 is a joke."""

我是python的新手,正在研究正则表达式,试图将所有7,11,12和13位数字都更改为零。我希望它仍然看起来像一个电话号码。例如将555-0198更改为000-0000,是否有一种方法可以将311保持原样而不变为零?以下是我能够提出的内容

起初我尝试过,但是使所有数字都为零

    for word in corpus.split():
        nums = re.sub("(\d)", "0",word)
        print(nums)

然后我尝试过,但是我意识到用这种方式对11位和13位数字不正确

    def sub_nums():
        for word in corpus.split():
           nums = re.sub("(\d{1,4})-+(\d{1,4})", "000-0000",word)
           print(nums)
    sub_nums()
python regex substitution
2个回答
0
投票

我使用的正则表达式是:

r'(?<!\S)(?:(?=(-*\d-*){7}(\s|\Z))[\d-]+|(?=(-*\d-*){11}(\s|\Z))[\d-]+|(?=(-*\d-*){12}(\s|\Z))[\d-]+|(?=(-*\d-*){13}(\s|\Z))[\d-]+)'

[7、11、12和13位电话号码有重复的“主题”或模式,所以我只解释7位电话号码的模式:

  1. (?!\S)这是negative lookbehind,适用于所有模式,并说电话号码必须not后面带有not空格字符。这是一个双重否定,并且positive lookbehind (?=\s|\A),它表示电话号码必须以空格字符串开头。但是,这是可变长度的回溯,Python随附的regex引擎不支持该变量(但PyPi存储库的regex程序包支持)。(?=(-*\d-*){7}(\s|\Z)) 7位电话号码的
  2. lookahead
  3. 要求说,下一个字符必须由数字和连字符组成,后跟空格或字符串末尾and数字必须正好是7。[\d-]+进行输入中下一个数字和连字符的实际匹配。
  4. See Regex Demo

import re corpus = """In the US 555-0198 and 1-206-5705-0100 are examples fictitious numbers. In the UK, 044-113-496-1834 is a fictitious number. In Ireland, the number 353-020-917-1234 is fictitious. And in Australia, 061-970-654-321 is a fictitious number. 311 is a joke.""" regex = r'(?<!\S)(?:(?=(-*\d-*){7}(\s|\Z))[\d-]+|(?=(-*\d-*){11}(\s|\Z))[\d-]+|(?=(-*\d-*){12}(\s|\Z))[\d-]+|(?=(-*\d-*){13}(\s|\Z))[\d-]+)' new_corpus = re.sub(regex, lambda m: re.sub(r'\d', '0', m[0]), corpus) print(new_corpus)

打印:

In the US 000-0000 and 0-000-0000-0000 are examples fictitious numbers. In the UK, 000-000-000-0000 is a fictitious number. In Ireland, the number 000-000-000-0000 is fictitious. And in Australia, 000-000-000-000 is a fictitious number. 311 is a joke.


0
投票
[我认为使用模式先匹配由连字符分隔的数字,然后检查匹配项中的位数是否等于7,11,12或13,可能会更容易。
© www.soinside.com 2019 - 2024. All rights reserved.