我有一个具有短字符串属性和长多行字符串属性的对象。我想将短字符串写为 YAML 带引号的标量,将多行字符串写为文字标量:
my_obj.short = "Hello"
my_obj.long = "Line1\nLine2\nLine3"
我希望 YAML 看起来像这样:
short: "Hello"
long: |
Line1
Line2
Line3
如何指示 PyYAML 执行此操作?如果我调用
yaml.dump(my_obj)
,它会产生类似字典的输出:
{long: 'line1
line2
line3
', short: Hello}
(不知道为什么 long 是这样的双倍行距......)
我可以指示 PyYAML 如何处理我的属性吗?我想影响顺序和风格。
爱上了@lbt 的方法,我得到了这个代码:
import yaml
def str_presenter(dumper, data):
if len(data.splitlines()) > 1: # check for multiline string
return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
return dumper.represent_scalar('tag:yaml.org,2002:str', data)
yaml.add_representer(str, str_presenter)
# to use with safe_dump:
yaml.representer.SafeRepresenter.add_representer(str, str_presenter)
它使每个多行字符串都是块文字。
我试图避免猴子修补部分。 完全归功于@lbt 和@J.F.Sebastian。
基于 Python 中是否有支持将长字符串转储为块文字或折叠块的 yaml 库?
import yaml
from collections import OrderedDict
class quoted(str):
pass
def quoted_presenter(dumper, data):
return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='"')
yaml.add_representer(quoted, quoted_presenter)
class literal(str):
pass
def literal_presenter(dumper, data):
return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
yaml.add_representer(literal, literal_presenter)
def ordered_dict_presenter(dumper, data):
return dumper.represent_dict(data.items())
yaml.add_representer(OrderedDict, ordered_dict_presenter)
d = OrderedDict(short=quoted("Hello"), long=literal("Line1\nLine2\nLine3\n"))
print(yaml.dump(d))
short: "Hello"
long: |
Line1
Line2
Line3
我希望任何带有
\n
的输入都是块文字。使用 yaml/representer.py
中的代码作为基础,我得到:
# -*- coding: utf-8 -*-
import yaml
def should_use_block(value):
for c in u"\u000a\u000d\u001c\u001d\u001e\u0085\u2028\u2029":
if c in value:
return True
return False
def my_represent_scalar(self, tag, value, style=None):
if style is None:
if should_use_block(value):
style='|'
else:
style = self.default_style
node = yaml.representer.ScalarNode(tag, value, style=style)
if self.alias_key is not None:
self.represented_objects[self.alias_key] = node
return node
a={'short': "Hello", 'multiline': """Line1
Line2
Line3
""", 'multiline-unicode': u"""Lêne1
Lêne2
Lêne3
"""}
print(yaml.dump(a))
print(yaml.dump(a, allow_unicode=True))
yaml.representer.BaseRepresenter.represent_scalar = my_represent_scalar
print(yaml.dump(a))
print(yaml.dump(a, allow_unicode=True))
输出
{multiline: 'Line1
Line2
Line3
', multiline-unicode: "L\xEAne1\nL\xEAne2\nL\xEAne3\n", short: Hello}
{multiline: 'Line1
Line2
Line3
', multiline-unicode: 'Lêne1
Lêne2
Lêne3
', short: Hello}
After override
multiline: |
Line1
Line2
Line3
multiline-unicode: "L\xEAne1\nL\xEAne2\nL\xEAne3\n"
short: Hello
multiline: |
Line1
Line2
Line3
multiline-unicode: |
Lêne1
Lêne2
Lêne3
short: Hello
ruamel.yaml
及其 RoundTripLoader/Dumper (免责声明:我是该包的作者)除了做您想做的事情之外,它还支持 YAML 1.2 规范(从 2009 年开始),并且有其他一些改进:
import sys
from ruamel.yaml import YAML
yaml_str = """\
short: "Hello" # does keep the quotes, but need to tell the loader
long: |
Line1
Line2
Line3
folded: >
some like
explicit folding
of scalars
for readability
"""
yaml = YAML()
yaml.preserve_quotes = True
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)
给出:
short: "Hello" # does keep the quotes, but need to tell the loader
long: |
Line1
Line2
Line3
folded: >
some like
explicit folding
of scalars
for readability
(包括评论,与之前在同一列开始)
您也可以从头开始创建此输出,但随后您 确实需要提供额外的信息,例如折叠位置的明确位置。
值得注意的是 pyyaml 不允许块标量中存在尾随空格,并将强制内容采用双引号格式。看来很多人都遇到过这个问题。如果您不关心能够往返数据,这将删除那些尾随空格:
def str_presenter(dumper, data):
if len(data.splitlines()) > 1 or '\n' in data:
text_list = [line.rstrip() for line in data.splitlines()]
fixed_data = "\n".join(text_list)
return dumper.represent_scalar('tag:yaml.org,2002:str', fixed_data, style='|')
return dumper.represent_scalar('tag:yaml.org,2002:str', data)
yaml.add_representer(str, str_presenter)
使用
ruamel.yaml
由 Anthon 在这里发布,这里有一些简单的函数可以将 yaml 文本转换为 dict,反之亦然,您可以方便地保留在实用程序函数中:
from ruamel.yaml import YAML
from io import StringIO
def yaml2dict(y):
return YAML().load(y)
def dict2yaml(d):
output_stream = StringIO()
YAML().dump(d, output_stream)
return output_stream.getvalue()
要听写的多行 yaml 示例:
y = """
title: organelles absent in animal cells and present in a plant cell
question: |
Observe the following table and identify if the cell is of a plant or an animal
| Organelle | Present/Absent |
|---------- | -------------- |
| Nucleus | Present |
| Vacuole | Present |
| Cellwall | Absent |
| Cell membrane | Present |
| Mitochondria | Present |
| Chlorophyll | Absent |
answer_type: MCQ_single
choices:
- Plant
- Animal
points: 1
"""
d = yaml2dict(y)
d
输出:
{'title': 'organelles absent in animal cells and present in a plant cell', 'question': 'Observe the following table and identify if the cell is of a plant or an animal\n| Organelle | Present/Absent | \n|---------- | -------------- | \n| Nucleus | Present |\n| Vacuole | Present |\n| Cellwall | Absent |\n| Cell membrane | Present |\n| Mitochondria | Present |\n| Chlorophyll | Absent |\n', 'answer_type': 'MCQ_single', 'choices': ['Plant', 'Animal'], 'points': 1}
将其转换回 yaml:
y2 = dict2yaml(d)
print(y2)
输出:
title: organelles absent in animal cells and present in a plant cell
question: |
Observe the following table and identify if the cell is of a plant or an animal
| Organelle | Present/Absent |
|---------- | -------------- |
| Nucleus | Present |
| Vacuole | Present |
| Cellwall | Absent |
| Cell membrane | Present |
| Mitochondria | Present |
| Chlorophyll | Absent |
answer_type: MCQ_single
choices:
- Plant
- Animal
points: 1
为了完整起见,请通过以下方式安装
ruamel.yaml
:
pip install ruamel.yaml