这里有一个有趣的问题。给定一个 pandas 数据框(甚至是一个 Python 列表),如何查找可能位于该列表中的小计?例如:
running value
0 False 50709
1 False 26715
2 False 1715
3 False 79139
4 False 34447
5 False -7256
6 False 1210
7 False 42913
8 True 36227
9 False 999
10 False 20107
11 False 5787
12 False -1466
13 False -216
14 False 615
15 False 24827
16 True 11400
17 False 5642
18 True 5758
19 False -5
20 True 5753
数据观察:
[3, 7, 15]
是小计,[8, 16, 18, 20]
是运行总计。[3, 7, 15]
分别代表行 [0, 1, 2]
、[4, 5, 6]
和 [10, 11, 12, 13, 14]
。我需要识别小计和每个小计代表的行。
请参阅下面我的回答。
我有一个答案:
import pandas as pd
def gen_subtotal_indices(df):
targets = set() #used for fast test of inclusion
targets_lst = []
signs = []
indices = []
for i, r in df.iterrows():
if r['running']:
continue
v = r['value']
if v in targets:
yield i, indices, signs[targets_lst.index(v)]
targets = set()
targets_lst = []
signs = []
indices = []
continue
if len(targets) == 0:
targets = {x for x in (0, v, -v)}
targets_lst = [x for x in (0, v, -v)]
signs = [[x] for x in (0, 1, -1)]
indices.append(i)
else:
targets |= {t + x for t in targets for x in (0, v, -v)}
targets_lst = [t + x for t in targets_lst for x in (0, v, -v)]
signs = [t + [x] for t in signs for x in (0, 1, -1)]
indices.append(i)
df = pd.DataFrame({'running': [False, False, False, False, False, False, False, False, True, False, False,
False, False, False, False, False, True, False, True, False, True],
'value': [50709, 26715, 1715, 79139, 34447, -7256, 1210, 42913, 36227, 999, 20107, 5787, -1466,
-216, 615, 24827, 11400, 5642, 5758, -5, 5753]})
print(df)
result = list(gen_subtotal_indices(df))
print(result)
产生
您可以看到小计的索引,后跟所包含项目的索引,后跟一个包含 0、1 或 -1 的向量,表示原始数据的乘数。可能会更好,但这是总体思路。
任何想法或改进表示赞赏!