我正在尝试制作一个程序,从用户那里获取邮政编码输入并检查它是否有效。到目前为止,我有:
postalCode = input("Postal code: ")
postalCode = postalCode.replace(" ", "")
postalCode = postalCode.lower()
letters = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"]
numbers = ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]
valid = True
for i in range(0, len(postalCode), 2):
if postalCode[i] not in letters or postalCode[i+1] not in numbers:
valid = False
break
if(valid):
print("Valid postal code.")
else:
print("Not a valid postal code.")
代码运行正常,但我知道使用表达式会更加可行,但我无法弄清楚它们是如何工作的。
加拿大邮政编码格式为:L / N / L N / L / N.
谢谢
没有正则表达式解决方案:
直接得到你的事实 - a-z错了,由于相似性,一些字母被省略:
A Neufundland B Nova Scotia C Prince Edward Island
E New Brunswick G Québec-Ost H Montréal und Laval
J Québec-West K Ontario-Ost L Ontario-Mitte
M Groß-Toronto N Ontario-Südwest P Ontario-Nord
R Manitoba S Saskatchewan T Alberta
V British Columbia X NW-Territ. Nunavut Y Yukon
码:
def CheckCanadianPostalcodes(p, strictCapitalization=False, fixSpace=True):
'''returns a Tuple of (boolean, string):
- (True, postalCode) or
- (False, error message)
By default lower and upper case characters are allowed,
a missing middle space will be substituted.'''
pc = p.strip() # copy p, strip whitespaces front/end
if fixSpace and len(pc) == 6:
pc = pc[0:3] + " " + pc[3:] # if allowed and needd insert missing space
nums = "0123456789" # allowed numbers
alph = "ABCEGHJKLMNPRSTVWXYZ" # allowed characters (WZ handled below)
mustBeNums = [1,4,6] # index of number
mustBeAlph = [0,2,5] # index of character (WZ handled below)
illegalCharacters = [x for x in pc if x not in (nums + alph.lower() + alph + " ")]
if strictCapitalization:
illegalCharacters = [x for x in pc if x not in (alph + nums + " ")]
if illegalCharacters:
return(False, "Illegal characters detected: " + str(illegalCharacters))
postalCode = [x.upper() for x in pc] # copy to uppercase list
if len(postalCode) != 7: # length-validation
return (False, "Length not 7")
for idx in range(0,len(postalCode)): # loop ofer all indexes
ch = postalCode[idx]
if ch in nums and idx not in mustBeNums: # is s number, check index
return (False, "Format not 'ADA DAD'")
elif ch in alph and idx not in mustBeAlph: # id s character, check index
return (False, "Format not 'ADA DAD'") # alpha / digit
elif ch == " " and idx != 3: # is space in between
return (False, "Format not 'ADA DAD'")
if postalCode[0] in "WZ": # no W or Z first char
return (False, "Cant start with W or Z")
return (True,"".join(postalCode)) # yep - all good
testCases = [(True,"A9A 9A9"), (True,"a9a 9a9"), (True,"A9A9A9"), (True,"a9a9a9"),
(False,"w9A 9A9"), (False,"z9a 9a9"), (False,"a9a 9!9")]
for t in testCases:
pc = CheckCanadianPostalcodes(t[1]) # output differs, see func description
assert pc[0] == t[0], "Error in assertion: " + str(t) + " became " + str(pc)
print(t[1], " => ", pc)
pp = input("Postal code: ")
print(CheckCanadianPostalcodes(pp)) # output differs, see func description
输出:
A9A 9A9 => (True, 'A9A 9A9')
a9a 9a9 => (True, 'A9A 9A9')
A9A9A9 => (True, 'A9A 9A9')
a9a9a9 => (True, 'A9A 9A9')
w9A 9A9 => (False, 'Cant start with W or Z')
z9a 9a9 => (False, 'Cant start with W or Z')
a9a 9!9 => (False, "Illegal characters detected: ['!']")
Postal code: b2c3d4
(False, "Illegal characters detected: ['d']")
Press any key to continue . . .
这个answer with regex(不接受)提供正确的正则表达式。
可能的邮政编码数量(from wikipedia)
邮政编码不包括字母D,F,I,O,Q或U,第一个位置也不使用字母W或Z.
[...]
As the
加拿大邮政保留一些特殊功能的FSA,例如测试或促销目的,(例如圣诞老人的H0H 0H0,见下文)以及对加拿大境外目的地的邮件进行分类。[...]
在没有WZ作为第一个字符的情况下,您将获得ABCEGHJKLMNPRSTVXY。
编辑:jl-peyret的变更建议
根据您的问题,您可以使用:
import re
postalCode = input("Postal code: ")
pattern = re.match(r'[A-Z]{1}[0-9]{1}[A-Z]{1}\s[0-9]{1}[A-Z]{1}[0-9]{1}',postalCode)
如果模式:
print('Valid postal code')
其他:
print('Invalid postal code')
您也可以使用sub方法并获取序列,这样您就不必像上面那样重复代码。
我开始构建两个字符串,一个包含可以在邮政编码中的任何(合法)位置使用的字母字符,以及一个包含必须用于第一个位置的字母字符的字符串。
>>> any_position = 'ABCEGHJKLMNPRSTVWXYZ'
>>> first_position = 'ABCEGHJKLMNPRSTVXY'
这几行代码会针对一些试用示例显示正则表达式及其性能。如果<_sre.SRE_Match object; ...
没有出现在正则表达式的调用下,那么这意味着测试由于某种原因而失败。
编辑:我应该解释正则表达式的作用。
如果您可以安排将小写字母转换为大写字母,则包含小写字母字符的邮政编码是可以接受的。如果你想接受它们,就像Patrick Artner建议的那样,将re.I
或re.IGNORECASE
添加为match
语句的参数。
>>> import re
>>> postal_code_re = re.compile(r'^[ABCEGHJKLMNPRSTVXY][0-9][ABCEGHJKLMNPRSTVWXYZ] [0-9][ABCEGHJKLMNPRSTVWXYZ][0-9]$')
>>> postal_code_re.match('H0H 0H0')
<_sre.SRE_Match object; span=(0, 7), match='H0H 0H0'>
>>> postal_code_re.match('A0A 0A0')
<_sre.SRE_Match object; span=(0, 7), match='A0A 0A0'>
>>> postal_code_re.match('W0A 0A0')
>>> postal_code_re.match('Q0A 0A0')
>>> postal_code_re.match('H0H 0Q0')
值得一提的是,这种方法只测试代码的格式。仅测试其有效性是不够的,因为许多代码未被使用。对于小批量测试,可以使用https://www.canadapost.ca/web/en/pages/tools/default.page中的一个工具和网络抓取技术来检查代码是否在实际使用中,或者甚至是否是有效格式。