在python中使用ttp模块忽略数据

Question

我将通过以下示例解释我遇到的问题。我可以使用以下配置解析以下数据。当我使用

{{ignore}}

命令时，它可以帮助我获取该行，因为该行与正确的模板匹配，并忽略我不想要的数据。

from ttp import ttp
import json

data_to_parse = """
1.peace in the world
2.peace in the world world 
3.peace in the world world world 
"""

要解析此数据，我可以使用以下模板。

ttp_template = """
<group name="Quote">
{{peace}} in the {{world}}
</group>
<group name="Quote">
{{peace}} in the {{world}} {{ignore}}
</group>
<group name="Quote">
{{peace}} in the {{world}} {{ignore}} {{ignore}}
</group>
"""

通过以下配置，我可以根据需要解析数据：

def parser(data_to_parse):

    parser = ttp(data=data_to_parse, template=ttp_template)
    parser.parse()

    # print result in JSON format
    results = parser.result(format='json')[0]
    #print(results)

    #converting str to json. 
    result = json.loads(results)

    print(result)

parser(data_to_parse)

查看我的输出：

问题是我无法猜测每行有多少个“世界”，而且我不想继续编写 {{ignore}} 命令来获取所需的行并避免使用我不知道的单词不想有。例如，如果我在数据中添加以下行，我上面共享的模板将不会捕获它，我将需要再添加一个 {{ignore}} 来捕获以下数据。

4.peace in the world world world world

据我所知，ttp 将单词与每个空格分开。例如，如果我有

而不是“空格”，如下

3.peace in the world_world_world

我可以在模板中使用简单的一行来获取数据。然而，在我的数据中，我需要注意并捕获这些带有空格的行。

所以问题是有什么办法可以促进这个过程吗？正如您所看到的，我有一个解决方法，但是我需要找到一种简单的方法来解决该问题。非常感谢您的任何建议。

Answer 1

我找到了解决这个问题的方法。

{{ name | PHRASE }}

或

{{ name | ORPHRASE }}

可用于此目的。

{{ name | PHRASE }}

此模式匹配任何短语 - 由单个空格字符分隔的单词集合，例如“word1 word2 word3”。

{{ name | ORPHRASE }}

在许多情况下，需要提取的数据可以是单个单词或短语，最突出的例子 - 各种描述，例如接口描述、BGP 对等体描述等。ORPHRASE 允许匹配和提取此类数据。

Answer 2

PHRASE 匹配由单个空格分隔的单词。

如果需要，您可以在忽略语句中使用正则表达式模式，这样您就可以匹配所有字符并忽略它们。

{{ ignore('.*') }}

- 忽略所有内容（任何字符“.”重复 0 次或多次“*”），直到行尾。

所以在你的情况下，你也可以使用

{{ peace }} in the {{ world }}{{ ignore('.*') }}

注意世界和忽略之间缺少的空间，这也有助于匹配第一个条目

Python 3.11.2 (main, Jun  6 2023, 07:39:01) [GCC 8.5.0 20210514 (Red Hat 8.5.0-18)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from ttp import ttp
>>> import json
>>> data_to_parse = """
... 1.peace in the world
... 2.peace in the world world
... 3.peace in the world world world
... """
>>> ttp_template = """
... <group name="Quote">
... {{ peace }} in the {{ world }}{{ ignore('.*') }}
... </group>
... """
>>> parser = ttp(data=data_to_parse, template=ttp_template)
>>> parser.parse()
>>> results = parser.result(format='json')[0]
>>> result = json.loads(results)
>>> result
[{'Quote': [{'peace': '1.peace', 'world': 'world'}, {'peace': '2.peace', 'world': 'world'}, {'peace': '3.peace', 'world': 'world'}]}]
>>> exit()

在python中使用ttp模块忽略数据

问题描述投票：0回答：2

2个回答

最新问题

在python中使用ttp模块忽略数据

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2