我有一系列包含key,value
对的文件名。例如,filename1
包含:
A : U
B : 10
C : checksum1
我想根据其他键的唯一值的选择来获得一组值。例如,如果我文件中的键值可以表示为:
A B C D
-------------------------
U 10 checksum1 filename1
U 10 checksum2 filename2
U 20 checksum3 filename3
V 20 checksum4 filename4
V 20 checksum5 filename5
我想获得:
t = table.unique_values_for(["A","B"])
# [("U",10), ("U",20), ("V,20")]
t.result_for_unique(["C","D"])
# [
# [(checksum1, filename1),(checksum2 filename2)], <-result for ("U",10)
# [(checksum3, filename3)], <- result for ("U",20)
# [(checksum4, filename4), (checksum5, filename5)] <- result for ("V,20")
# ]
我尝试使用普通的dict
,pandas
,astropy.table
。
这是到目前为止我尝试过的测试:
class minidb():
def __init__(self, pattern):
if isinstance(pattern, str):
pattern = [pattern]
self.pattern = pattern
self.heads = [ get_fits_header(f, fast=True) for f in pattern ]
keys = self.heads[0].keys()
values = [ [ h.get(k) for h in self.heads ] for k in keys ]
dic = dict(zip(keys, values))
dic["ARP FILENAME"] = pattern # adding filename
self.dic = dic
self.table = Table(dic) # original
self.data = self.table
self.unique = None
self.names = None
def unique_for(self, keys):
# if isinstance(keys, str):
# keys = [keys]
self.data = self.table.group_by(keys)
self.unique = self.data.groups.keys.as_array().tolist()
return self.unique
def names_for(self, keys):
if isinstance(keys, str):
keys = [keys]
self.names = [ np.array(g[keys]).tolist() for g in self.data.groups]
self.data = self.table[keys]
return self.names
熊猫可以使用groupby
轻松做到这一点:
In [1]: df = pd.DataFrame([
...: dict(A='U', B=10, C=1, D=1),
...: dict(A='U', B=10, C=2, D=2),
...: dict(A='U', B=20, C=3, D=3),
...: dict(A='V', B=20, C=4, D=4),
...: dict(A='V', B=20, C=5, D=5)
...: ])
In [2]: list(df.groupby(['A', 'B']))
Out[2]:
[(('U', 10),
A B C D
0 U 10 1 1
1 U 10 2 2),
(('U', 20),
A B C D
2 U 20 3 3),
(('V', 20),
A B C D
3 V 20 4 4
4 V 20 5 5)]
该列表中的每个元素都是键的元组(值“ A”和“ B”)和一个数据框(技术上是原始数据框的视图),仅包含具有“ A”和“ “ B”。您可以循环查看分组结果,并从“ C”和“ D”中提取所需的任何信息,因为通常会从数据框中获取数据。