熊猫比较两个不同长度的数据帧并将某些行分成两半

问题描述 投票:0回答:1

我正在了解熊猫的工作原理,并且正在努力操纵和比较熊猫数据框。

我有三个数据帧,仅提取了所需的信息;

subjectDF:
   Subject ID              Subject  Year  Teaching Hours PW Facility Requirement
0       Mat13                Maths    13                  5                    N
1      FMat13  Further Mathematics    13                  5                    N
2       Eco13            Economics    13                  5                    N
3       Geo13            Geography    13                  5                    N
4       His13              History    13                  4                    N
5   EngLang13     English Language    13                  4                    N
6    EngLit13   English Literature    13                  4                    N
7       Ger13               German    13                  4                    N
8       Fre13               French    13                  4                    N
9       Spa13              Spanish    13                  4                    N
10      Bus13             Business    13                  4                    N
11     Film13         Film Studies    13                  4                    N
12      Psy13           Psychology    13                  5                    N
13      Lat13                Latin    13                  4                    N
14      Gre13                Greek    13                  4                    N
15      Cla13            Classical    13                  4                    N
16     Phil13           Philosophy    13                  4                    N

studentDF:
      Subject                                         Student ID  Student Number
0       Art13                                          [S8, S19]               2
1       Bio13  [S1, S4, S12, S13, S18, S24, S25, S28, S29, S3...              17
2       Bus13                                    [S10, S30, S47]               3
3       Che13  [S1, S2, S3, S4, S12, S13, S14, S24, S25, S26,...              20
4       Cla13                                     [S9, S33, S35]               3
5       Com13  [S2, S3, S10, S14, S16, S19, S31, S45, S192, S...              10
6       Eco13  [S6, S15, S17, S20, S23, S30, S31, S36, S41, S...              13
7   EngLang13                           [S9, S11, S21, S22, S47]               5
8    EngLit13                       [S5, S9, S22, S28, S32, S37]               6
9      FMat13                     [S7, S14, S27, S38, S45, S192]               6
10     Film13                                               [S8]               1
11      Fre13                     [S5, S15, S18, S29, S37, S193]               6
12      Geo13  [S6, S11, S20, S23, S32, S34, S36, S41, S42, S43]              10
13      Ger13                                   [S17, S43, S195]               3
14      Gre13                                         [S33, S40]               2
15      His13            [S5, S11, S21, S22, S32, S35, S37, S41]               8
16      Lat13                                         [S33, S35]               2
17      Mat13  [S1, S2, S3, S4, S6, S7, S10, S12, S13, S14, S...              34
18     Phil13              [S15, S16, S21, S40, S42, S193, S194]               7
19      Phy13  [S1, S7, S26, S27, S38, S44, S48, S49, S50, S2...              12
20      Psy13                                          [S8, S46]               2
21      Spa13                                    [S18, S36, S47]               3

classroomDF:
  Classroom ID Facility  Capacity
0            C8     None        25
1            C9     None        30
2           C10     None        12
3           C11     None        10
4           C12     None        10
5           C13     None        10
6           C14     None        20
7           C15     None        15
8           C16     None        15
9           C17     None        22
10          C22     None         5
11          C23     None         5

我正在尝试比较'Subject ID'中的subjectDF'Subject'中的studentDF,并且如果'Subject'中未列出'Subject ID'中的行,请删除该行。例如,由于Bio13中的'Subject'未列出在'Subject ID'中,因此我希望将Bio13studentDF中删除。

因此,预期的输出将与StudentDF完全相同,但没有“ Subject ID”中没有的行。

studentDF:
      Subject                                         Student ID  Student Number
0       Art13                                          [S8, S19]               2
1       Bus13                                    [S10, S30, S47]               3

我尝试了许多不同的方法,但是大多数时候我遇到以下错误;

ValueError: Can only compare identically-labeled Series objects

我不确定是否应该在这里提出另一个问题,我现在将其发布,如果有问题,我将在另一个问题中发布。

修改了StudentDF之后,我想将'Student Numbers'中的studentDF'Capacity'中的classroomDF进行比较,如果“学生人数”>“能力”,请将学生和学科一分为二。例如,Mat13有34个学生,这大于教室DF的最大容量。因此,我想再次修改studentDF,如下所示:studentDF:

        Subject                                         Student ID  Student Number
16       ....
17      Mat13_1  [S1, S2, S3, S4, S6, S7, S10, S12, S13, S14, S...              17
18      Mat13_2  [S15, S16, S...                                                17
         ....

任何解决此问题的帮助将不胜感激!

python pandas dataframe string-comparison
1个回答
0
投票

IIUC,这是您想要的

studentDF[~studentDF['Subject'].isin(subjectDF['Subject ID'])]

输出(由于我的Jupyter笔记本显示设置,学生ID列在此处被截断了)

Subject     Student ID                                          Student Number
0   Art13   [S8, S19]                                           2
1   Bio13   [S1, S4, S12, S13, S18, S24, S25, S28, S29, S3...   17
3   Che13   [S1, S2, S3, S4, S12, S13, S14, S24, S25, S26,...   20
5   Com13   [S2, S3, S10, S14, S16, S19, S31, S45, S192, S...   10
19  Phy13   [S1, S7, S26, S27, S38, S44, S48, S49, S50, S2...   12
© www.soinside.com 2019 - 2024. All rights reserved.