如何对 pandas 中的重复数据进行分组求和——同时仍保留其他列

问题描述 投票:0回答:1

我是 pandas 新手,正在努力解决一些数据争论问题。

我有一个大约如下所示的数据源:

      location  available  sold     name  local_id    more_data
0   1001 - BBB          1     0    Alpha        24   'DJQ3DD3y'
1   1001 - BBB          1     0    Alpha        24   'aB3joXQy'
2   1001 - BBB          1     0    Alpha        24   'ZJve572B'
3   1001 - BBB          1     0    Alpha        24   'DJEkx8Dy'
4   1001 - BBB          1     0    Alpha        24   'VyVaWLYp'
5   1001 - BBB          1     0    Bravo        19   'Rpr7AvVy'
6   1001 - BBB          1     0    Bravo        19   'ZJlO0VmB'
7   1001 - BBB          1     0    Bravo        19   'OBb6NrrB'
8   1001 - BBB          1     0    Bravo        19   'ZJvPPEXy'
9   1001 - BBB          1     0  Charlie         6   'Vy9MMOEy'
10  1001 - BBB          1     0  Charlie         6   'MJ8AALKp'
11  1001 - BBB          1     0    Delta        17   'vpWmN1kB'
12  1001 - BBB          1     0    Delta        17   'DJEb9qQp'
13  1001 - BBB          1     0     Echo         7   'qyZ1zn1p'
14  1001 - BBB          1     0     Echo         7   'bBqaYoMB'
15  1001 - BBB          1     0     Golf        22   'AJgLr9qp'
16  1001 - BBB          1     0     Golf        22   'vBdV57Ap'
17  1001 - BBB          1     0     Golf        22   'VJYxnLZB'
18   1001 - GG       1029   237  Charlie         6   'VJYxnGXB'
19   1001 - GG       1029   237  Charlie         6   'Vy9Mo52y'
20   1001 - GG       1029   237    Delta        17   'aB3zxYWy'
21   1001 - GG       1029   237    Delta        17   'MJ8A3z0p'
22   1001 - GG       1029   237     Echo         7   'YpLMPwNy'
23   1001 - GG       1029   237     Echo         7   '8Bwev1ep'
24   1001 - GG       1029   237     Golf        22   'MJXm6bLp'
25   1001 - GG       1029   237     Golf        22   'oye7XR0J'
26   1001 - GG       1029   237     Golf        22   'vpWmDDYB'
27    1001 - P        873   375  Charlie         6   'DJEbjjkp'
28    1001 - P        873   375  Charlie         6   'aB3z66zy'
29    1001 - P        873   375    Delta        17   'Kp4zrrKB'
30    1001 - P        873   375    Delta        17   'oyxqMMAB'
31    1001 - P        873   375     Echo         7   'zJ1KMMZy'
32    1001 - P        873   375     Echo         7   'ZJlOzz6B'
33    1001 - P        873   375  Foxtrot        20   'YpLMbbay'
34    1001 - P        873   375  Foxtrot        20   'ZJnmzzYB'
35    1001 - P        873   375     Golf        22   'Kp4zr5LB'
36    1001 - P        873   375     Golf        22   'oye7jg8J'
37    1001 - P        873   375     Golf        22   'OBb6jE3B'
38   1002 - GG         37    11  Charlie         6   'EyGMWPbJ'
39   1002 - GG         37    11  Charlie         6   'aB3zOoDy'
40   1002 - GG         37    11    Delta        17   'DJQ4laLB'
41   1002 - GG         37    11    Delta        17   'ZJlOvNXB'
42   1002 - GG         37    11     Echo         7   'Rpr7a8Dy'
43   1002 - GG         37    11     Echo         7   'zJjYNR4B'
44   1002 - GG         37    11     Golf        22   'Vy9MqkRy'
45   1002 - GG         37    11     Golf        22   'oye7Y0kJ'
46   1002 - GG         37    11     Golf        22   '8BweZbnp'
47    1002 - P       1854   826  Charlie         6   'Rpr7Z5by'
48    1002 - P       1854   826  Charlie         6   'vBdVK1Ap'
49    1002 - P       1854   826    Delta        17   '4Jkae8Dy'
50    1002 - P       1854   826    Delta        17   'YpLM3nxy'
51    1002 - P       1854   826     Echo         7   'VB7vD6Py'
52    1002 - P       1854   826     Echo         7   'ZJlOXbzB'
53    1002 - P       1854   826  Foxtrot        20   'MpNqezKJ'
54    1002 - P       1854   826  Foxtrot        20   '9pOWo39p'
55    1002 - P       1854   826     Golf        22   'MJXm5qnp'
56    1002 - P       1854   826     Golf        22   'oy5vxd4B'
57    1002 - P       1854   826     Golf        22   'DJQ4qz3B'

如您所见,

available
sold
是与
location
列相关的。我想做的是将这些按
location
列的第一部分(即“1001 - XXX”的“1001”部分)分组,并对
available
sold
的唯一值求和,同时保留每行唯一的其他数据,并包括作为键的数字类型,我不想更改。

因此,输出将如下所示:

    location  available  sold     name  local_id    more_data
0       1001       1903   612    Alpha        24   'DJQ3DD3y'
1       1001       1903   612    Alpha        24   'aB3joXQy'
2       1001       1903   612    Alpha        24   'ZJve572B'
3       1001       1903   612    Alpha        24   'DJEkx8Dy'
4       1001       1903   612    Alpha        24   'VyVaWLYp'
5       1001       1903   612    Bravo        19   'Rpr7AvVy'
6       1001       1903   612    Bravo        19   'ZJlO0VmB'
7       1001       1903   612    Bravo        19   'OBb6NrrB'
8       1001       1903   612    Bravo        19   'ZJvPPEXy'
9       1001       1903   612  Charlie         6   'Vy9MMOEy'
10      1001       1903   612  Charlie         6   'MJ8AALKp'
11      1001       1903   612    Delta        17   'vpWmN1kB'
12      1001       1903   612    Delta        17   'DJEb9qQp'
13      1001       1903   612     Echo         7   'qyZ1zn1p'
14      1001       1903   612     Echo         7   'bBqaYoMB'
15      1001       1903   612     Golf        22   'AJgLr9qp'
16      1001       1903   612     Golf        22   'vBdV57Ap'
17      1001       1903   612     Golf        22   'VJYxnLZB'
18      1001       1903   612  Charlie         6   'VJYxnGXB'
19      1001       1903   612  Charlie         6   'Vy9Mo52y'
20      1001       1903   612    Delta        17   'aB3zxYWy'
21      1001       1903   612    Delta        17   'MJ8A3z0p'
22      1001       1903   612     Echo         7   'YpLMPwNy'
23      1001       1903   612     Echo         7   '8Bwev1ep'
24      1001       1903   612     Golf        22   'MJXm6bLp'
25      1001       1903   612     Golf        22   'oye7XR0J'
26      1001       1903   612     Golf        22   'vpWmDDYB'
27      1001       1903   612  Charlie         6   'DJEbjjkp'
28      1001       1903   612  Charlie         6   'aB3z66zy'
29      1001       1903   612    Delta        17   'Kp4zrrKB'
30      1001       1903   612    Delta        17   'oyxqMMAB'
31      1001       1903   612     Echo         7   'zJ1KMMZy'
32      1001       1903   612     Echo         7   'ZJlOzz6B'
33      1001       1903   612  Foxtrot        20   'YpLMbbay'
34      1001       1903   612  Foxtrot        20   'ZJnmzzYB'
35      1001       1903   612     Golf        22   'Kp4zr5LB'
36      1001       1903   612     Golf        22   'oye7jg8J'
37      1001       1903   612     Golf        22   'OBb6jE3B'
38      1002       1891   837  Charlie         6   'EyGMWPbJ'
39      1002       1891   837  Charlie         6   'aB3zOoDy'
40      1002       1891   837    Delta        17   'DJQ4laLB'
41      1002       1891   837    Delta        17   'ZJlOvNXB'
42      1002       1891   837     Echo         7   'Rpr7a8Dy'
43      1002       1891   837     Echo         7   'zJjYNR4B'
44      1002       1891   837     Golf        22   'Vy9MqkRy'
45      1002       1891   837     Golf        22   'oye7Y0kJ'
46      1002       1891   837     Golf        22   '8BweZbnp'
47      1002       1891   837  Charlie         6   'Rpr7Z5by'
48      1002       1891   837  Charlie         6   'vBdVK1Ap'
49      1002       1891   837    Delta        17   '4Jkae8Dy'
50      1002       1891   837    Delta        17   'YpLM3nxy'
51      1002       1891   837     Echo         7   'VB7vD6Py'
52      1002       1891   837     Echo         7   'ZJlOXbzB'
53      1002       1891   837  Foxtrot        20   'MpNqezKJ'
54      1002       1891   837  Foxtrot        20   '9pOWo39p'
55      1002       1891   837     Golf        22   'MJXm5qnp'
56      1002       1891   837     Golf        22   'oy5vxd4B'
57      1002       1891   837     Golf        22   'DJQ4qz3B'

我知道我错过了一些超级简单的东西,因为这是熊猫驾驶室的问题。但遗憾的是,这正是我处于学习曲线上的地方;我希望有人能引导我走向正确的方向。

pandas dataframe group-by
1个回答
0
投票

用途:

out = (df.assign(location = df['location'].str.split(' - ').str[0])
          .groupby(df.columns.difference(['available','sold']).tolist(), as_index=False)
          .sum()
          .reindex(df.columns, axis=1))
© www.soinside.com 2019 - 2024. All rights reserved.