基于这本词典:
dictionary = {
"base_name" : ["abc", "abc_2"],
"conditional" : {
"base_name" : "abc_cond",
"conditions" : [{"func" : "==", "ref" : 1}],
"res" : -0.044
},
"outlier" : 0.232,
"out_name" : "var",
"transformation" : [
{"func" : "==", "ref" : -1, "res" : -0.323},
{"func" : "<=", "ref" : 23, "res" : -0.123},
{"func" : ">", "ref" : -5, "res" : -0.433},
{"func" : "else", "res" : -0.663},
]
}
将这些值提取到 pandas 数据框的最佳方法是什么?
应该做这样的事情。我不是 100% 确定“异常值”。总是这么叫吗?或者您是否需要弄清楚另一个键是什么,在这种情况下是离群值?
是否有要处理的词典列表,或者只有这个?
import pandas as pd
dictionary = {
"base_name" : ["abc", "abc_2"],
"conditional" : {
"base_name" : "abc_cond",
"conditions" : [{"func" : "==", "ref" : 1}],
"res" : -0.044
},
"outlier" : 0.232,
"out_name" : "var",
"transformation" : [
{"func" : "==", "ref" : -1, "res" : -0.323},
{"func" : "<=", "ref" : 23, "res" : -0.123},
{"func" : ">", "ref" : -5, "res" : -0.433},
{"func" : "else", "res" : -0.663},
]
}
data = []
# Root level
row = {
"name": dictionary["out_name"],
"org_name": dictionary["base_name"],
"func": "outlier",
"ref": "",
"res": dictionary["outlier"]
}
data.append(row)
# Conditional
row = {
"name": dictionary["out_name"],
"org_name": dictionary["conditional"]["base_name"],
"func": dictionary["conditional"]["conditions"][0]["func"],
"ref": dictionary["conditional"]["conditions"][0]["ref"],
"res": dictionary["conditional"]["res"]
}
data.append(row)
# Each transformation
for transformation in dictionary["transformation"]:
row = {
"name": dictionary["out_name"],
"org_name": dictionary["base_name"],
"func": transformation["func"],
"ref": transformation.get("ref", ""),
"res": transformation["res"]
}
data.append(row)
df = pd.DataFrame(data=data)
print(df)
输出:
name org_name func ref res
0 var [abc, abc_2] outlier 0.232
1 var abc_cond == 1 -0.044
2 var [abc, abc_2] == -1 -0.323
3 var [abc, abc_2] <= 23 -0.123
4 var [abc, abc_2] > -5 -0.433
5 var [abc, abc_2] else -0.663
你的尝试只会得到字典的转换部分。您也需要显式获取字典的其他部分。
另外,在解包转换字典之前,先将
ref
值设置为空字符串,否则所有int
s将被隐式转换为float
s。
import pandas as pd
dictionary = {
"base_name" : ["abc", "abc_2"],
"conditional" : {
"base_name" : "abc_cond",
"conditions" : [{"func" : "==", "ref" : 1}],
"res" : -0.044
},
"outlier" : 0.232,
"out_name" : "var",
"transformation" : [
{"func" : "==", "ref" : -1, "res" : -0.323},
{"func" : "<=", "ref" : 23, "res" : -0.123},
{"func" : ">", "ref" : -5, "res" : -0.433},
{"func" : "else", "res" : -0.663},
]
}
data = []
# Root level
data.append({
"name": dictionary["out_name"],
"org_name": dictionary["base_name"],
"func": "outlier",
"ref": "",
"res": dictionary["outlier"]
})
# Conditional
data.append({
"name": dictionary["out_name"],
"org_name": dictionary["conditional"]["base_name"],
"func": dictionary["conditional"]["conditions"][0]["func"],
"ref": dictionary["conditional"]["conditions"][0]["ref"],
"res": dictionary["conditional"]["res"]
})
# Transformations
for transformation in dictionary["transformation"]:
data.append({
"name" : dictionary["out_name"],
"org_name" : dictionary["base_name"],
"ref": "", # If ref is missing in one of the dicts, pandas implicitly
# transforms all the ints to floats.
**transformation,
})
df = pd.DataFrame(data=data)
print(df)
输出:
name org_name func ref res
0 var [abc, abc_2] outlier 0.232
1 var abc_cond == 1 -0.044
2 var [abc, abc_2] == -1 -0.323
3 var [abc, abc_2] <= 23 -0.123
4 var [abc, abc_2] > -5 -0.433
5 var [abc, abc_2] else -0.663