将数字转换为英文字符串

问题描述 投票:0回答:2

http://www.easysurf.cc/cnvert18.htmhttp://www.calculatorsoup.com/calculators/conversions/numberstowords.php 这样的网站尝试将数字字符串转换为英文字符串,但是他们提供自然的声音输出。

例如,在http://www.easysurf.cc/cnvert18.htm

[in]: 100456
[out]:  one hundred  thousand four hundred fifty-six

这个网站好一点,http://www.calculator.org/calculate-online/mathematics/text-number.aspx:

[in]: 100456
[out]: one hundred thousand, four hundred and fifty-six

[in]: 10123124001
[out]: ten billion, one hundred and twenty-three million, one hundred and twenty-four thousand, one 

但它在某些时候会中断:

[in]: 10000000001
[out]: ten billion, , , one 

我已经编写了自己的版本,但它涉及很多规则,并且上限为 10 亿,来自 http://pastebin.com/WwFCjYtt

import codecs

def num2word (num):
  ones = {1:"one",2:"two",3:"three",4:"four",
          5:"five",6:"six",7:"seven",8:"eight",
          9:"nine",0:"zero",10:"ten"}
  teens = {11:"eleven",12:"twelve",13:"thirteen",
           14:"fourteen",15:"fifteen"}
  tens = {2:"twenty",3:"thirty",4:"forty",
          5:"fifty",6:"sixty",7:"seventy",
          8:"eighty",9:"ninety"}
  lens = {3:"hundred",4:"thousand",6:"hundred",7:"million",
          8:"million", 9:"million",10:"billion"#,13:"trillion",11:"googol",
          }

  if num > 999999999:
    return "Number more than 1 billion"

  # Ones
  if num < 11:
    return ones[num]
  # Teens
  if num < 20:
    word = ones[num%10] + "teen" if num > 15 else teens[num]
    return word
  # Tens
  if num > 19 and num < 100:
    word = tens[int(str(num)[0])]
    if str(num)[1] == "0":
      return word
    else:
      word = word + " " + ones[num%10]
      return word

  # First digit for thousands,hundred-thousands.
  if len(str(num)) in lens and len(str(num)) != 3:
    word = ones[int(str(num)[0])] + " " + lens[len(str(num))]
  else:
    word = ""

  # Hundred to Million  
  if num < 1000000:
    # First and Second digit for ten thousands.  
    if len(str(num)) == 5:
      word = num2word(int(str(num)[0:2])) + " thousand"
    # How many hundred-thousand(s).
    if len(str(num)) == 6:
      word = word + " " + num2word(int(str(num)[1:3])) + \
            " " + lens[len(str(num))-2]
    # How many hundred(s)?
    thousand_pt = len(str(num)) - 3
    word = word + " " + ones[int(str(num)[thousand_pt])] + \
            " " + lens[len(str(num))-thousand_pt]
    # Last 2 digits.
    last2 = num2word(int(str(num)[-2:]))
    if last2 != "zero":
      word = word + " and " + last2
    word = word.replace(" zero hundred","")
    return word.strip()

  left, right = '',''  
  # Less than 1 million.
  if num < 100000000:
    left = num2word(int(str(num)[:-6])) + " " + lens[len(str(num))]
    right = num2word(int(str(num)[-6:]))
  # From 1 million to 1 billion.
  if num > 100000000 and num < 1000000000:
    left = num2word(int(str(num)[:3])) +  " " + lens[len(str(num))]
    right = num2word(int(str(num)[-6:]))
  if int(str(num)[-6:]) < 100:
    word = left + " and " + right
  else:  
    word = left + " " + right
  word = word.replace(" zero hundred","").replace(" zero thousand"," thousand")
  return word

print num2word(int(raw_input("Give me a number:\n")))

我怎样才能让我写的脚本接受

> billion

还有其他方法可以获得相同的输出吗?

我的代码可以写得不那么冗长吗?

python nlp
2个回答
3
投票

解决此问题的更通用方法是使用重复除法(即

divmod
),并且仅对必要的特殊/边缘情况进行硬编码。

例如,

divmod(1034393, 1000000) -> (1, 34393)
,这样您就有效地找到了数百万,并留下余数以供进一步计算。

可能更具说明性的示例:

divmod(1034393, 1000) -> (1034, 393)
,它允许您从右侧一次去掉 3 位十进制数字组。

在英语中,我们倾向于将数字分成三组,并且也适用类似的规则。这应该参数化而不是硬编码。例如,“303”可以是三亿三百万、三十三千或三百零三。除了后缀之外,逻辑应该是相同的,具体取决于您所在的位置。编辑:看起来这是由于递归而存在的。

这是我所说的这种方法的部分示例,使用生成器并对整数进行操作,而不是到处做大量的

int(str(i)[..])

say_base = ['zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven',
    'eight', 'nine', 'ten', 'eleven', 'twelve', 'thirteen', 'fourteen',
    'fifteen', 'sixteen', 'seventeen', 'eighteen', 'nineteen']

say_tens = ['', '', 'twenty', 'thirty', 'forty', 'fifty', 'sixty', 'seventy',
    'eighty', 'ninety']

def hundreds_i(num):
    hundreds, rest = divmod(num, 100)
    if hundreds:
        yield say_base[hundreds]
        yield ' hundred'
    if 0 < rest < len(say_base):
        yield ' and '
        yield say_base[rest]
    elif rest != 0:
        tens, ones = divmod(rest, 10)
        yield ' and '
        yield say_tens[tens]
        if ones > 0:
            yield '-'
            yield say_base[ones]

assert "".join(hundreds_i(245)) == "two hundred and forty-five"
assert "".join(hundreds_i(999)) == 'nine hundred and ninety-nine'
assert "".join(hundreds_i(200)) == 'two hundred'

0
投票

由于当前接受的答案与“零”有一些问题,我在这里提供另一个答案:

def number_to_text(n, *, hyphen="-", joiner="and", comma=","):
    unitNames = ["one", "two", "three", "four", "five", "six", "seven", "eight", 
                 "nine", "ten", "eleven", "twelve", "thirteen", "fourteen",
                 "fifteen", "sixteen", "seventeen", "eightteen", "nineteen"]
    tensNames = ["twenty", "thirty", "forty", "fifty", 
                 "sixty", "seventy", "eighty", "ninety"]
    tripletNames = ["", "thousand"] + [s + "illion" 
                        for s in ["m", "b", "tr", "quadr", "quint", 
                                  "sext", "sept", "oct", "non"]
                    ] + [s + "decillion"
                        for s in ["", "un", "duo", "tre", "quattuor", 
                                  "quin", "sex", "septen", "octo", "novem"]
                    ] + ["vigintillion"]  # add as needed....
                
    def triplets(n):
        for tripletName in tripletNames:
            num = n % 1000
            n //= 1000
            if num == 0:
                continue
            hundreds = num // 100
            num %= 100
            tens =  num // 10 if num > 19 else 0
            num -= tens * 10
            yield ((unitNames[hundreds-1] + " hundred " if hundreds else "")
                + (joiner + " " if joiner and (n or hundreds) and (tens or num) else "") 
                + (tensNames[tens-2] + (hyphen if num else " ") if tens else "")
                + (unitNames[num-1] + " " if num else "")
                + tripletName).strip()
            if n == 0:
                return
        raise ValueError("number too large for this converter")

    return (comma + " ").join(reversed(list(triplets(n)))) if n else "zero"

调用示例:

print(number_to_text(1234567890123456789, hyphen="-", joiner="and", comma=","))

输出:

一五千万、二百三十四万亿、五百六十七万亿、八千九百亿、一亿两千三百万、四十五万六千、七百八十九

如果您不喜欢逗号或“and”,请为相应选项传递空字符串:

print(number_to_text(101000001, hyphen="-", joiner="", comma=""))

一亿一百万一

注意:这使用了短刻度。如果您需要长刻度,请相应地更新列表

tripletNames

© www.soinside.com 2019 - 2024. All rights reserved.