我有这样的字典:
{ "id" : "abcde",
"key1" : "blah",
"key2" : "blah blah",
"nestedlist" : [
{ "id" : "qwerty",
"nestednestedlist" : [
{ "id" : "xyz",
"keyA" : "blah blah blah" },
{ "id" : "fghi",
"keyZ" : "blah blah blah" }],
"anothernestednestedlist" : [
{ "id" : "asdf",
"keyQ" : "blah blah" },
{ "id" : "yuiop",
"keyW" : "blah" }] } ] }
基本上是具有嵌套列表,字典和字符串的任意深度的字典。
遍历此方法以提取每个“ id”键的值的最佳方法是什么?我想实现与“ // id”之类的XPath查询等效的功能。 “ id”的值始终是一个字符串。
所以从我的示例中,我需要的输出基本上是:
["abcde", "qwerty", "xyz", "fghi", "asdf", "yuiop"]
顺序不重要。
我发现此问题非常有趣,因为它为同一问题提供了几种不同的解决方案。我采用了所有这些功能,并使用一个复杂的字典对象对其进行了测试。我必须从测试中删除两个函数,因为它们必须有很多失败结果,并且它们不支持将返回列表或dict作为值,我认为这是必不可少的,因为应该为几乎[[any]]数据准备一个函数,来。因此,我通过timeit
模块以100.000迭代的速度注入了其他功能,输出结果如下:
0.11 usec/pass on gen_dict_extract(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
6.03 usec/pass on find_all_items(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
0.15 usec/pass on findkeys(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1.79 usec/pass on get_recursively(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
0.14 usec/pass on find(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
0.36 usec/pass on dict_extract(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
所有功能都具有相同的搜索标记('logging')和相同的字典对象,其结构如下:
o = { 'temparature': '50',
'logging': {
'handlers': {
'console': {
'formatter': 'simple',
'class': 'logging.StreamHandler',
'stream': 'ext://sys.stdout',
'level': 'DEBUG'
}
},
'loggers': {
'simpleExample': {
'handlers': ['console'],
'propagate': 'no',
'level': 'INFO'
},
'root': {
'handlers': ['console'],
'level': 'DEBUG'
}
},
'version': '1',
'formatters': {
'simple': {
'datefmt': "'%Y-%m-%d %H:%M:%S'",
'format': '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
}
}
},
'treatment': {'second': 5, 'last': 4, 'first': 4},
'treatment_plan': [[4, 5, 4], [4, 5, 4], [5, 5, 5]]
}
所有功能均提供相同的结果,但时间差异很大!函数gen_dict_extract(k,o)
是我从此处的函数改编的函数,实际上,它非常类似于Alfe的find
函数,主要区别在于,如果要传递字符串,我要检查给定对象是否具有iteritems函数递归期间:
def gen_dict_extract(key, var):
if hasattr(var,'iteritems'):
for k, v in var.iteritems():
if k == key:
yield v
if isinstance(v, dict):
for result in gen_dict_extract(key, v):
yield result
elif isinstance(v, list):
for d in v:
for result in gen_dict_extract(key, d):
yield result
因此,此变体是此处功能中最快,最安全的。并且find_all_items
的速度慢得令人难以置信,并且与第二慢的get_recursivley
相距遥远,而除dict_extract
以外的其他位置彼此都很接近。fun
和keyHole
函数仅在您要查找字符串时起作用。
有趣的学习方面:)
d = { "id" : "abcde",
"key1" : "blah",
"key2" : "blah blah",
"nestedlist" : [
{ "id" : "qwerty",
"nestednestedlist" : [
{ "id" : "xyz", "keyA" : "blah blah blah" },
{ "id" : "fghi", "keyZ" : "blah blah blah" }],
"anothernestednestedlist" : [
{ "id" : "asdf", "keyQ" : "blah blah" },
{ "id" : "yuiop", "keyW" : "blah" }] } ] }
def fun(d):
if 'id' in d:
yield d['id']
for k in d:
if isinstance(d[k], list):
for i in d[k]:
for j in fun(i):
yield j
def find(key, value):
for k, v in value.iteritems():
if k == key:
yield v
elif isinstance(v, dict):
for result in find(key, v):
yield result
elif isinstance(v, list):
for d in v:
for result in find(key, d):
yield result
d = { "id" : "abcde",
"key1" : "blah",
"key2" : "blah blah",
"nestedlist" : [
{ "id" : "qwerty",
"nestednestedlist" : [
{ "id" : "xyz", "keyA" : "blah blah blah" },
{ "id" : "fghi", "keyZ" : "blah blah blah" }],
"anothernestednestedlist" : [
{ "id" : "asdf", "keyQ" : "blah blah" },
{ "id" : "yuiop", "keyW" : "blah" }] } ] }
def findkeys(node, kv):
if isinstance(node, list):
for i in node:
for x in findkeys(i, kv):
yield x
elif isinstance(node, dict):
if kv in node:
yield node[kv]
for j in node.values():
for x in findkeys(j, kv):
yield x
print(list(findkeys(d, 'id')))
yield from
并接受顶级列表来迭代@ hexerei-software的出色答案。 注:此版本不考虑列表