Python lex - TypeError：未知文本

Question

我正在尝试编写一个简单的lex解析器。应对目前：

from ply import lex

tokens = (
      'COMMENT',
      'OTHER'
  )

  t_COMMENT = r'^\#.*\n'

  t_OTHER = r'^[^\#].*\n'

  def t_error(t):
      raise TypeError("Unknown text '%s'" % (t.value,))

  lex.lex()

lex.input(yaml)
  for tok in iter(lex.token, None):
      print repr(tok.type), repr(tok.value)

但是无法解析简单的输入文件：

    # This is a real comment
    #And this one also

    #/*
    # *
    # *Variable de feeu
    # */
    ma_var: True

    It is done, over, kaput

使用以下输出：

l
'COMMENT' '# This is a real comment\n'
Traceback (most recent call last):
  File "parser_adoc.py", line 62, in <module>
    main2()
  File "parser_adoc.py", line 57, in main2
    for tok in iter(lex.token, None):
  File "/usr/lib/python2.7/site-packages/ply/lex.py", line 384, in token
    newtok = self.lexerrorf(tok)
  File "parser_adoc.py", line 44, in t_error
    raise TypeError("Unknown text '%s'" % (t.value,))
TypeError: Unknown text '#And this one also

#/*
# *
# *Variable de feeu
# */
ma_var: True

this is done
'

总而言之，我定义了2个正则表达式：

一个以#开头的行
一个不是与#开始的行

但它不起作用。我不明白我的正则表达式有什么问题。

你能帮忙吗？

西蒙

Answer 1

在python正则表达式（PLY使用）中，除非设置了多行模式，否则^指的是字符串的开头，而不是行的开头。因此，由于您的两个规则都以^开头，因此它们只能在第一行匹配。

您可以通过将正则表达式包装在(?m:...)中来解决这个问题，^启用了多行模式，但这里甚至不需要。相反，你可以从规则的开头删除qazxswpoi，它将按你的意愿工作。由于您的两个规则始终与整行匹配，因此下一个标记将始终从行的开头开始 - 无需锚定它们。

Python lex - TypeError：未知文本

问题描述投票：0回答：1

1个回答

最新问题

Python lex - TypeError：未知文本

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1