我正在了解熊猫的工作原理,并且正在努力操纵和比较熊猫数据框。
我有三个数据帧,仅提取了所需的信息;
subjectDF:
Subject ID Subject Year Teaching Hours PW Facility Requirement
0 Mat13 Maths 13 5 N
1 FMat13 Further Mathematics 13 5 N
2 Eco13 Economics 13 5 N
3 Geo13 Geography 13 5 N
4 His13 History 13 4 N
5 EngLang13 English Language 13 4 N
6 EngLit13 English Literature 13 4 N
7 Ger13 German 13 4 N
8 Fre13 French 13 4 N
9 Spa13 Spanish 13 4 N
10 Bus13 Business 13 4 N
11 Film13 Film Studies 13 4 N
12 Psy13 Psychology 13 5 N
13 Lat13 Latin 13 4 N
14 Gre13 Greek 13 4 N
15 Cla13 Classical 13 4 N
16 Phil13 Philosophy 13 4 N
studentDF:
Subject Student ID Student Number
0 Art13 [S8, S19] 2
1 Bio13 [S1, S4, S12, S13, S18, S24, S25, S28, S29, S3... 17
2 Bus13 [S10, S30, S47] 3
3 Che13 [S1, S2, S3, S4, S12, S13, S14, S24, S25, S26,... 20
4 Cla13 [S9, S33, S35] 3
5 Com13 [S2, S3, S10, S14, S16, S19, S31, S45, S192, S... 10
6 Eco13 [S6, S15, S17, S20, S23, S30, S31, S36, S41, S... 13
7 EngLang13 [S9, S11, S21, S22, S47] 5
8 EngLit13 [S5, S9, S22, S28, S32, S37] 6
9 FMat13 [S7, S14, S27, S38, S45, S192] 6
10 Film13 [S8] 1
11 Fre13 [S5, S15, S18, S29, S37, S193] 6
12 Geo13 [S6, S11, S20, S23, S32, S34, S36, S41, S42, S43] 10
13 Ger13 [S17, S43, S195] 3
14 Gre13 [S33, S40] 2
15 His13 [S5, S11, S21, S22, S32, S35, S37, S41] 8
16 Lat13 [S33, S35] 2
17 Mat13 [S1, S2, S3, S4, S6, S7, S10, S12, S13, S14, S... 34
18 Phil13 [S15, S16, S21, S40, S42, S193, S194] 7
19 Phy13 [S1, S7, S26, S27, S38, S44, S48, S49, S50, S2... 12
20 Psy13 [S8, S46] 2
21 Spa13 [S18, S36, S47] 3
classroomDF:
Classroom ID Facility Capacity
0 C8 None 25
1 C9 None 30
2 C10 None 12
3 C11 None 10
4 C12 None 10
5 C13 None 10
6 C14 None 20
7 C15 None 15
8 C16 None 15
9 C17 None 22
10 C22 None 5
11 C23 None 5
我正在尝试比较'Subject ID'
中的subjectDF
和'Subject'
中的studentDF
,并且如果'Subject'
中未列出'Subject ID'
中的行,请删除该行。例如,由于Bio13
中的'Subject'
未列出在'Subject ID'
中,因此我希望将Bio13
从studentDF
中删除。
因此,预期的输出将与StudentDF完全相同,但没有“ Subject ID”中没有的行。
studentDF:
Subject Student ID Student Number
0 Art13 [S8, S19] 2
1 Bus13 [S10, S30, S47] 3
我尝试了许多不同的方法,但是大多数时候我遇到以下错误;
ValueError: Can only compare identically-labeled Series objects
我不确定是否应该在这里提出另一个问题,我现在将其发布,如果有问题,我将在另一个问题中发布。
修改了StudentDF之后,我想将'Student Numbers'
中的studentDF
与'Capacity'
中的classroomDF
进行比较,如果“学生人数”>“能力”,请将学生和学科一分为二。例如,Mat13有34个学生,这大于教室DF的最大容量。因此,我想再次修改studentDF,如下所示:studentDF:
Subject Student ID Student Number
16 ....
17 Mat13_1 [S1, S2, S3, S4, S6, S7, S10, S12, S13, S14, S... 17
18 Mat13_2 [S15, S16, S... 17
....
任何解决此问题的帮助将不胜感激!
IIUC,这是您想要的
studentDF[~studentDF['Subject'].isin(subjectDF['Subject ID'])]
输出(由于我的Jupyter笔记本显示设置,学生ID列在此处被截断了)
Subject Student ID Student Number
0 Art13 [S8, S19] 2
1 Bio13 [S1, S4, S12, S13, S18, S24, S25, S28, S29, S3... 17
3 Che13 [S1, S2, S3, S4, S12, S13, S14, S24, S25, S26,... 20
5 Com13 [S2, S3, S10, S14, S16, S19, S31, S45, S192, S... 10
19 Phy13 [S1, S7, S26, S27, S38, S44, S48, S49, S50, S2... 12