我正在使用尴尬的数组并将信息转储到具有多索引的 pandas 数据帧:
>>> import awkward as ak
>>> import pandas as pd
>>> ak_arr = ak.Array([
... {
... 'jet_pt': [2.33e+05, 1.1e+04, 1.47e+05, 1.33e+04, 1.73e+05, 1.07e+04],
... 'jet_num': 6,
... 'bb_dR': [0.83e-01, 0.56e-01, 0.98e-01, 0.32e-01, 0.21e-01, 0.66e-01],
... 'hh_m': 3.25e+05
... },
... {
... 'jet_pt': [1.48e+05, 2.06e+04, 9.93e+04, 1.29e+04],
... 'jet_num': 4,
... 'bb_dR': [0.12e-1, 0.32e-01, 0.45e-01, 0.76e-01, 0.33e-01, 0.54e-01],
... 'hh_m': 2.87e+05
... }
... ])
>>> ak_arr
<Array [{jet_pt: [...], ...}, {...}] type='2 * {jet_pt: var * float64, jet_...'>
>>> df = ak.to_dataframe(ak_arr, how='outer')
>>> df
jet_pt jet_num bb_dR hh_m
entry subentry
0 0 233000.0 6 0.083 325000.0
1 11000.0 6 0.056 325000.0
2 147000.0 6 0.098 325000.0
3 13300.0 6 0.032 325000.0
4 173000.0 6 0.021 325000.0
5 10700.0 6 0.066 325000.0
1 0 148000.0 4 0.012 287000.0
1 20600.0 4 0.032 287000.0
2 99300.0 4 0.045 287000.0
3 12900.0 4 0.076 287000.0
4 NaN 4 0.033 287000.0
5 NaN 4 0.054 287000.0
我想知道:
jet_pt
entry subentry
0 0 233000.0
1 11000.0
2 147000.0
3 13300.0
4 173000.0
5 10700.0
1 0 148000.0
1 20600.0
2 99300.0
3 12900.0
我可以通过以下方式实现此结果:
jet_num = df['jet_num'].max(level=0)
jet_z = df['jet_z'].groupby(level=0).apply(lambda x: x[:jet_num[x.name]]).droplevel(0)
但我觉得效率很低。
bb_dR
entry subentry
0 0 0.083
1 0.056
2 0.098
3 0.032
1 0 0.012
1 0.032
2 0.045
3 0.076
再次,我可以通过以下方式达到想要的结果:
df['bb_dR'].groupby(level=0).apply(lambda x: x[:4]).droplevel(0)
但仍然认为有更好的方法。
hh_m
entry subentry
0 0 325000.0
1 0 287000.0
我认为对于 3,删除条目和子条目也很有用。预先感谢。
答案1
cond = df.index.get_level_values(1) < df['jet_num']
out1 = df.loc[cond, ['jet_pt']]
输出1
jet_pt
entry subentry
0 0 233000.0
1 11000.0
2 147000.0
3 13300.0
4 173000.0
5 10700.0
1 0 148000.0
1 20600.0
2 99300.0
3 12900.0
回答2
out2 = df.loc[(slice(None), slice(0, 3)), ['bb_dR']]
输出2
bb_dR
entry subentry
0 0 0.083
1 0.056
2 0.098
3 0.032
1 0 0.012
1 0.032
2 0.045
3 0.076
答案3
out3 = df.loc[(slice(None), 0), ['hh_m']]
输出3
hh_m
entry subentry
0 0 325000.0
1 0 287000.0