如何标记有语法错误的Python源代码？

Question

我正在尝试标记具有语法错误的Python源代码，然后将其作为统计模型（例如循环神经网络）的输入。

然而，对于有语法错误的Python代码，内置的

tokenizer.py

会产生

ErrorToken

。

这是我正在使用的功能：

def to_token_list(s: str) -> List:
    tokens = []  # list of tokens extracted from source code.

    g = tokenize.tokenize(BytesIO(s.encode("utf-8")).readline)

    for t in g:
        tokens.append(t)

    return tokens

这是一个示例输入 - 它缺少右括号（对象被屏蔽为

ID

）：

syntax_error_source_code = "\ndef ID ID ):\n    if ID .ID :\n        ID .ID .ID ()\n"
to_token_list(syntax_error_source_code)

错误：

Exception has occurred: TokenError
('EOF in multi-line statement', (5, 0))

我可以通过将函数包装在

try-except

中来解决此错误，但它无法解决下一个示例中的中间错误。

另一个失败的例子

try-except

：

syntax_error_source_code = '\ndef ID ():\n/    for ID ,ID in ID :\n        pass \n    for ID ,ID in ID :\n        pass \n'
to_token_list(syntax_error_source_code)

错误：

Exception has occurred: IndentationError
unindent does not match any outer indentation level (<tokenize>, line 5)

我发现了关于这个问题的讨论：https://bugs.python.org/issue12675

有办法绕过这个吗？

Answer 1

我从经历过同一课程的人那里找到了解决这个问题的方法

以下是相关部分。

分词器是：

import tokenize
import io

def tokenizer(
        s: str, id: int, error_dict: dict
    ) -> List[tokenize.TokenInfo]:
    
    fp = io.StringIO(s)
    filter_types = [tokenize.ENCODING, tokenize.ENDMARKER, tokenize.ERRORTOKEN]
    tokens = []
    token_gen = tokenize.generate_tokens(fp.readline)
    while True:
        try:
            token = next(token_gen)
            if token.string and token.type not in filter_types:
                tokens.append(token)
        except tokenize.TokenError:
            error_dict["TokenError"].append(id)
            break
        except StopIteration:
            break
        except IndentationError:
            error_dict["IndentationError"].append(id)
            continue
    return tokens

一些澄清：

error_dict

是可能弹出的错误字典，例如：

{"TokenError": [], "IndentationError": []}

```
s
```
是您要标记化的字符串格式的source_code
```
id
```
是较大数据库中 source_code 的 id。

如何标记有语法错误的Python源代码？

问题描述投票：0回答：1

1个回答

最新问题

如何标记有语法错误的Python源代码？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1