我是 pandas 新手,正在努力解决一些数据争论问题。
我有一个大约如下所示的数据源:
location available sold name local_id more_data
0 1001 - BBB 1 0 Alpha 24 'DJQ3DD3y'
1 1001 - BBB 1 0 Alpha 24 'aB3joXQy'
2 1001 - BBB 1 0 Alpha 24 'ZJve572B'
3 1001 - BBB 1 0 Alpha 24 'DJEkx8Dy'
4 1001 - BBB 1 0 Alpha 24 'VyVaWLYp'
5 1001 - BBB 1 0 Bravo 19 'Rpr7AvVy'
6 1001 - BBB 1 0 Bravo 19 'ZJlO0VmB'
7 1001 - BBB 1 0 Bravo 19 'OBb6NrrB'
8 1001 - BBB 1 0 Bravo 19 'ZJvPPEXy'
9 1001 - BBB 1 0 Charlie 6 'Vy9MMOEy'
10 1001 - BBB 1 0 Charlie 6 'MJ8AALKp'
11 1001 - BBB 1 0 Delta 17 'vpWmN1kB'
12 1001 - BBB 1 0 Delta 17 'DJEb9qQp'
13 1001 - BBB 1 0 Echo 7 'qyZ1zn1p'
14 1001 - BBB 1 0 Echo 7 'bBqaYoMB'
15 1001 - BBB 1 0 Golf 22 'AJgLr9qp'
16 1001 - BBB 1 0 Golf 22 'vBdV57Ap'
17 1001 - BBB 1 0 Golf 22 'VJYxnLZB'
18 1001 - GG 1029 237 Charlie 6 'VJYxnGXB'
19 1001 - GG 1029 237 Charlie 6 'Vy9Mo52y'
20 1001 - GG 1029 237 Delta 17 'aB3zxYWy'
21 1001 - GG 1029 237 Delta 17 'MJ8A3z0p'
22 1001 - GG 1029 237 Echo 7 'YpLMPwNy'
23 1001 - GG 1029 237 Echo 7 '8Bwev1ep'
24 1001 - GG 1029 237 Golf 22 'MJXm6bLp'
25 1001 - GG 1029 237 Golf 22 'oye7XR0J'
26 1001 - GG 1029 237 Golf 22 'vpWmDDYB'
27 1001 - P 873 375 Charlie 6 'DJEbjjkp'
28 1001 - P 873 375 Charlie 6 'aB3z66zy'
29 1001 - P 873 375 Delta 17 'Kp4zrrKB'
30 1001 - P 873 375 Delta 17 'oyxqMMAB'
31 1001 - P 873 375 Echo 7 'zJ1KMMZy'
32 1001 - P 873 375 Echo 7 'ZJlOzz6B'
33 1001 - P 873 375 Foxtrot 20 'YpLMbbay'
34 1001 - P 873 375 Foxtrot 20 'ZJnmzzYB'
35 1001 - P 873 375 Golf 22 'Kp4zr5LB'
36 1001 - P 873 375 Golf 22 'oye7jg8J'
37 1001 - P 873 375 Golf 22 'OBb6jE3B'
38 1002 - GG 37 11 Charlie 6 'EyGMWPbJ'
39 1002 - GG 37 11 Charlie 6 'aB3zOoDy'
40 1002 - GG 37 11 Delta 17 'DJQ4laLB'
41 1002 - GG 37 11 Delta 17 'ZJlOvNXB'
42 1002 - GG 37 11 Echo 7 'Rpr7a8Dy'
43 1002 - GG 37 11 Echo 7 'zJjYNR4B'
44 1002 - GG 37 11 Golf 22 'Vy9MqkRy'
45 1002 - GG 37 11 Golf 22 'oye7Y0kJ'
46 1002 - GG 37 11 Golf 22 '8BweZbnp'
47 1002 - P 1854 826 Charlie 6 'Rpr7Z5by'
48 1002 - P 1854 826 Charlie 6 'vBdVK1Ap'
49 1002 - P 1854 826 Delta 17 '4Jkae8Dy'
50 1002 - P 1854 826 Delta 17 'YpLM3nxy'
51 1002 - P 1854 826 Echo 7 'VB7vD6Py'
52 1002 - P 1854 826 Echo 7 'ZJlOXbzB'
53 1002 - P 1854 826 Foxtrot 20 'MpNqezKJ'
54 1002 - P 1854 826 Foxtrot 20 '9pOWo39p'
55 1002 - P 1854 826 Golf 22 'MJXm5qnp'
56 1002 - P 1854 826 Golf 22 'oy5vxd4B'
57 1002 - P 1854 826 Golf 22 'DJQ4qz3B'
如您所见,
available
和 sold
是与 location
列相关的。我想做的是将这些按 location
列的第一部分(即“1001 - XXX”的“1001”部分)分组,并对 available
和 sold
的唯一值求和,同时保留每行唯一的其他数据,并包括作为键的数字类型,我不想更改。
因此,输出将如下所示:
location available sold name local_id more_data
0 1001 1903 612 Alpha 24 'DJQ3DD3y'
1 1001 1903 612 Alpha 24 'aB3joXQy'
2 1001 1903 612 Alpha 24 'ZJve572B'
3 1001 1903 612 Alpha 24 'DJEkx8Dy'
4 1001 1903 612 Alpha 24 'VyVaWLYp'
5 1001 1903 612 Bravo 19 'Rpr7AvVy'
6 1001 1903 612 Bravo 19 'ZJlO0VmB'
7 1001 1903 612 Bravo 19 'OBb6NrrB'
8 1001 1903 612 Bravo 19 'ZJvPPEXy'
9 1001 1903 612 Charlie 6 'Vy9MMOEy'
10 1001 1903 612 Charlie 6 'MJ8AALKp'
11 1001 1903 612 Delta 17 'vpWmN1kB'
12 1001 1903 612 Delta 17 'DJEb9qQp'
13 1001 1903 612 Echo 7 'qyZ1zn1p'
14 1001 1903 612 Echo 7 'bBqaYoMB'
15 1001 1903 612 Golf 22 'AJgLr9qp'
16 1001 1903 612 Golf 22 'vBdV57Ap'
17 1001 1903 612 Golf 22 'VJYxnLZB'
18 1001 1903 612 Charlie 6 'VJYxnGXB'
19 1001 1903 612 Charlie 6 'Vy9Mo52y'
20 1001 1903 612 Delta 17 'aB3zxYWy'
21 1001 1903 612 Delta 17 'MJ8A3z0p'
22 1001 1903 612 Echo 7 'YpLMPwNy'
23 1001 1903 612 Echo 7 '8Bwev1ep'
24 1001 1903 612 Golf 22 'MJXm6bLp'
25 1001 1903 612 Golf 22 'oye7XR0J'
26 1001 1903 612 Golf 22 'vpWmDDYB'
27 1001 1903 612 Charlie 6 'DJEbjjkp'
28 1001 1903 612 Charlie 6 'aB3z66zy'
29 1001 1903 612 Delta 17 'Kp4zrrKB'
30 1001 1903 612 Delta 17 'oyxqMMAB'
31 1001 1903 612 Echo 7 'zJ1KMMZy'
32 1001 1903 612 Echo 7 'ZJlOzz6B'
33 1001 1903 612 Foxtrot 20 'YpLMbbay'
34 1001 1903 612 Foxtrot 20 'ZJnmzzYB'
35 1001 1903 612 Golf 22 'Kp4zr5LB'
36 1001 1903 612 Golf 22 'oye7jg8J'
37 1001 1903 612 Golf 22 'OBb6jE3B'
38 1002 1891 837 Charlie 6 'EyGMWPbJ'
39 1002 1891 837 Charlie 6 'aB3zOoDy'
40 1002 1891 837 Delta 17 'DJQ4laLB'
41 1002 1891 837 Delta 17 'ZJlOvNXB'
42 1002 1891 837 Echo 7 'Rpr7a8Dy'
43 1002 1891 837 Echo 7 'zJjYNR4B'
44 1002 1891 837 Golf 22 'Vy9MqkRy'
45 1002 1891 837 Golf 22 'oye7Y0kJ'
46 1002 1891 837 Golf 22 '8BweZbnp'
47 1002 1891 837 Charlie 6 'Rpr7Z5by'
48 1002 1891 837 Charlie 6 'vBdVK1Ap'
49 1002 1891 837 Delta 17 '4Jkae8Dy'
50 1002 1891 837 Delta 17 'YpLM3nxy'
51 1002 1891 837 Echo 7 'VB7vD6Py'
52 1002 1891 837 Echo 7 'ZJlOXbzB'
53 1002 1891 837 Foxtrot 20 'MpNqezKJ'
54 1002 1891 837 Foxtrot 20 '9pOWo39p'
55 1002 1891 837 Golf 22 'MJXm5qnp'
56 1002 1891 837 Golf 22 'oy5vxd4B'
57 1002 1891 837 Golf 22 'DJQ4qz3B'
我知道我错过了一些超级简单的东西,因为这是熊猫驾驶室的问题。但遗憾的是,这正是我处于学习曲线上的地方;我希望有人能引导我走向正确的方向。
用途:
out = (df.assign(location = df['location'].str.split(' - ').str[0])
.groupby(df.columns.difference(['available','sold']).tolist(), as_index=False)
.sum()
.reindex(df.columns, axis=1))