我正在寻找一种在python中根据所需结构动态制作字典词典的方法。
我有下面的数据:
{'weather': ['windy', 'calm'], 'season': ['summer', 'winter', 'spring', 'autumn'], 'lateness': ['ontime', 'delayed']}
我给出了我希望它们像这样的结构:
['weather', 'season', 'lateness']
并最终以这种格式获取数据:
{'calm': {'autumn': {'delayed': 0, 'ontime': 0},
'spring': {'delayed': 0, 'ontime': 0},
'summer': {'delayed': 0, 'ontime': 0},
'winter': {'delayed': 0, 'ontime': 0}},
'windy': {'autumn': {'delayed': 0, 'ontime': 0},
'spring': {'delayed': 0, 'ontime': 0},
'summer': {'delayed': 0, 'ontime': 0},
'winter': {'delayed': 0, 'ontime': 0}}}
这是我为实现这一目标而想到的手动方式:
dtree = {}
for cat1 in category_cases['weather']:
dtree.setdefault(cat1, {})
for cat2 in category_cases['season']:
dtree[cat1].setdefault(cat2, {})
for cat3 in category_cases['lateness']:
dtree[cat1][cat2].setdefault(cat3, 0)
您能想到一种方法来改变我写的结构并获得理想的结果吗?请记住,结构可能每次都不相同。
此外,如果您想到除字典以外的其他方式,我也可以访问结果,它也将对我有用。
如果您不想使用外部软件包,pandas.DataFrame
可能是一个可行的选择,因为看起来您将使用表格:
pandas.DataFrame
结果:
import pandas as pd
df = pd.DataFrame(
index=pd.MultiIndex.from_product([d['weather'], d['season']]),
columns=d['lateness'], data=0
)
您可以轻松地使用 ontime delayed
windy summer 0 0
winter 0 0
spring 0 0
autumn 0 0
calm summer 0 0
winter 0 0
spring 0 0
autumn 0 0
进行更改:
indexing
如果您将始终使用列的最后一个键,则可以动态构造表,假设您的键按所需的插入顺序:
df.loc[('windy', 'summer'), 'ontime'] = 1
df.loc['calm', 'autumn']['delayed'] = 2
# Result:
ontime delayed
windy summer 1 0
winter 0 0
spring 0 0
autumn 0 0
calm summer 0 0
winter 0 0
spring 0 0
autumn 0 2
由于您对df = pd.DataFrame(
index=pd.MultiIndex.from_product(list(d.values())[:-1]),
columns=list(d.values())[-1], data=0
)
感兴趣,因此,鉴于您的结构,我还建议您对pandas
进行仔细阅读,以了解如何处理数据。这里有一些例子:
MultiIndex and Advance Indexing
它肯定是非常方便和通用的,但是在您不熟悉它之前,您可能肯定会先阅读一下,该框架可能需要一些时间来习惯。
否则,如果您仍然喜欢# Gets the sum of 'delayed' items in all of 'calm'
# Filters all the 'delayed' data in 'calm'
df.loc['calm', 'delayed']
# summer 5
# winter 0
# spring 0
# autumn 2
# Name: delayed, dtype: int64
# Apply a sum:
df.loc['calm', 'delayed'].sum()
# 7
# Gets the mean of all 'summer' (notice the `slice(None)` is required to return all of the 'calm' and 'windy' group)
df.loc[(slice(None), 'summer'), :].mean()
# ontime 0.5
# delayed 2.5
# dtype: float64
,则没有任何问题。这是一个基于给定键(假设您的键按所需的插入顺序):
dict
结果:
def gen_dict(d, level=0):
if level >= len(d):
return 0
key = tuple(d.keys())[level]
return {val: gen_dict(d, level+1) for val in d.get(key)}
gen_dict(d)
我认为这可能对您有用。
{'calm': {'autumn': {'delayed': 0, 'ontime': 0},
'spring': {'delayed': 0, 'ontime': 0},
'summer': {'delayed': 0, 'ontime': 0},
'winter': {'delayed': 0, 'ontime': 0}},
'windy': {'autumn': {'delayed': 0, 'ontime': 0},
'spring': {'delayed': 0, 'ontime': 0},
'summer': {'delayed': 0, 'ontime': 0},
'winter': {'delayed': 0, 'ontime': 0}}}
您可以使用def get_output(category, order, i=0):
output = {}
for key in order[i:i+1]:
for value in category[key]:
output[value] = get_output(category, order, i+1)
if output == {}:
return 0
return output
来获取字典值之间的笛卡尔积(假设您想要相同的键顺序)。然后,我们可以用itertools.product
循环访问除最后一个键以外的每个键。然后,我们可以用itertools.product
的数量设置最里面的键。
setdefault
输出:
0
是,您可以使用以下代码实现此目的:
from itertools import product
from pprint import pprint
d = {
"weather": ["windy", "calm"],
"season": ["summer", "winter", "spring", "autumn"],
"lateness": ["ontime", "delayed"],
}
result = {}
# Get every combination
for comb in product(*d.values()):
# Get current level of dictionary
current = result
# Go through each key except last
# Set dictionaries if we find new key
for key in comb[:-1]:
current = current.setdefault(key, {})
# Set innermost dictionary to 0 count
current[comb[-1]] = 0
pprint(result)
希望这会有所帮助!