使用 pandas 将 df 中的行移动单元格的值

问题描述 投票:0回答:1

我有以下输入表,我想获取输出表。

为此,我必须检查从第 14 行到文件末尾的每个单元格内容的最后部分,如果有匹配项,则将整个单元格值移动到第 10、11、12、13 行中(“STA ATT”、“TEC”、“D C P”、“A SPV”) 我必须这样做直到最后一栏

我编写了一个代码来做到这一点,它有效,但它非常混乱、低效且缓慢。我用 openpyxl 编写了它,但我认为这是最慢的方法,并且我认为 pandas 提供了最快的方法。

在下面的例子中: 对于 095 列,我必须将第 16 行中的内容移动到单元格 10,将第 17 行中的内容移动到单元格 11 对于第一列 088,我必须将第 14 行中的内容移动到单元格 12、15 到单元格 13、17 到单元格 11 对于第二列 088,我必须将第 18 行中的内容移动到单元格 12、19 到单元格 10、16 到单元格 10、20 到单元格 11 等等..

从 ROW 14 开始,每列中只有 4 个值需要移动到相应的 ROW 中 这些值不一定是连续的,但从第一个值(可以是空单元格)开始,其余值位于接下来的 3 个单元格中。 例如,在第 095 列中,我将仅移动 2 个值,X XX X_05.04.2024 16:40.STA ATT 和 X XX X_05.04.2024 16:40.TEC,因为前两个单元格为空

你能帮助我提高性能吗? 使用我编写的代码,我需要 15 分钟才能完成 df 为 482 列和 8k 行的任务

#输入

0  | Unnamed: 0                 | 095                               | 088                             | 088                               | 713                             | 714                           |
1  | APP SPV                    |                                   |                                 |                                   |                                 |                               |
2  | DAT                        |                                   | 11/04/2024                      | 11/04/2024                        |                                 |                               |
3  | ST AT                      | AN                                | ABA                             | ABA                               |                                 | PP                            |
4  | TEC                        | TOP                               |                                 |                                   | CAS                             |                               |
5  | DATE                       |                                   |                                 |                                   |                                 |                               |
6  |                            |                                   |                                 |                                   |                                 |                               |
7  |                            |                                   |                                 |                                   |                                 |                               |
8  |                            |                                   |                                 |                                   |                                 |                               |
9  |                            |                                   |                                 |                                   |                                 |                               |
10 | STA ATT                    |                                   |                                 |                                   |                                 |                               |
11 | TEC                        |                                   |                                 |                                   |                                 |                               |
12 | A SPV                      |                                   |                                 |                                   |                                 |                               |
13 | D C P                      |                                   |                                 |                                   |                                 |                               |
14 | A SPV_05.04.2024 16:40     |                                   | X XX X_05.04.2024 16:40.A SPV   |                                   |                                 |                               |
15 | D C P_05.04.2024 16:40     |                                   | X XX X_05.04.2024 16:40.D C P   |                                   |                                 |                               |
16 | STA ATT_05.04.2024 16:40   | X XX X_05.04.2024 16:40.STA ATT   |                                 |                                   | X XX X_05.04.2024 16:40.STA ATT |                               |
17 | TEC_05.04.2024 16:40       | X XX X_05.04.2024 16:40.TEC       | X XX X_05.04.2024 16:40.TEC     |                                   |                                 |                               |
18 | A SPV_05.04.2024 18:37     |                                   |                                 |  X XX X_05.04.2024 18:37.A SPV    |                                 |                               |
19 | STA ATT_05.04.2024 18:37   |                                   |                                 |  X XX X_05.04.2024 18:37.STA ATT  |                                 |                               |
20 | TEC_05.04.2024 18:37       |                                   |                                 |  X XX X_05.04.2024 18:37.TEC      |                                 |                               |
21 | A SPV_06.04.2024 10:11     |                                   |                                 |                                   |                                 | X XX X_06.04.2024 10:11.A SPV |

#输出

0  | Unnamed: 0                 | 095                               | 088                             | 088                               | 713                             | 714                           |
1  | APP SPV                    |                                   |                                 |                                   |                                 |                               |
2  | DAT                        |                                   | 11/04/2024                      | 11/04/2024                        |                                 |                               |
3  | ST AT                      | AN                                | ABA                             | ABA                               |                                 | PP                            |
4  | TEC                        | TOP                               |                                 |                                   | CAS                             |                               |
5  | DATE                       |                                   |                                 |                                   |                                 |                               |
6  |                            |                                   |                                 |                                   |                                 |                               |
7  |                            |                                   |                                 |                                   |                                 |                               |
8  |                            |                                   |                                 |                                   |                                 |                               |
9  |                            |                                   |                                 |                                   |                                 |                               |
10 | STA ATT                    | X XX X_05.04.2024 16:40.STA ATT   |                                 | X XX X_05.04.2024 18:37.STA ATT   | X XX X_05.04.2024 16:40.STA ATT |                               |
11 | TEC                        | X XX X_05.04.2024 16:40.TEC       | X XX X_05.04.2024 16:40.TEC     | X XX X_05.04.2024 18:37.TEC       |                                 |                               |
12 | A SPV                      |                                   | X XX X_05.04.2024 16:40.A SPV   | X XX X_05.04.2024 18:37.A SPV     |                                 |X XX X_06.04.2024 10:11.A SPV  |
13 | D C P                      |                                   | X XX X_05.04.2024 16:40.D C P   |                                   |                                 |                               |
14 | A SPV_05.04.2024 16:40     |                                   |                                 |                                   |                                 |                               |
15 | D C P_05.04.2024 16:40     |                                   |                                 |                                   |                                 |                               |
16 | STA ATT_05.04.2024 16:40   |                                   |                                 |                                   |                                 |                               |
17 | TEC_05.04.2024 16:40       |                                   |                                 |                                   |                                 |                               |
18 | A SPV_05.04.2024 18:37     |                                   |                                 |                                   |                                 |                               |
19 | STA ATT_05.04.2024 18:37   |                                   |                                 |                                   |                                 |                               |
20 | TEC_05.04.2024 18:37       |                                   |                                 |                                   |                                 |                               |
21 | A SPV_06.04.2024 10:11     |                                   |                                 |                                   |                                 |                               |


#代码

def find_between_r( s, first, last ):
    try:
        start = s.rindex( first ) + len( first )
        end = s.rindex( last, start )
        return s[start:end]
    except ValueError:
        return ""


cont = 0
val_a = ""
val_b = ""
val_c = ""
val_d = ""
val_cel_2 = ""
for colum in range(2,wsa.max_column+1):
    cont = 0
    Ul_pa_0 = ""
    val_cel_2 = ""
    for ro in range(15,wsa.max_row+1):
        val_cel_2 = ""
        In_St_At= 0
        In_Te= 0
        In_App_SPV = 0
        In_Da_Ci_Pr = 0
        val_cel_2 = str(wsa.cell(ro,colum).value)
        if val_cel_2 != None:
            if "_" in val_cel_2:
                Ul_pa_0 = find_between_r(val_cel_2, "_",".")
                wsa.cell(6,colum).value = Ul_pa_0
                if cont < 1:
                    cont = cont +1
                    ro = ro
                    ro1= ro +1
                    ro2 = ro +2
                    ro3 = ro +3
                    
                    Val = wsa.cell(ro,colum).value
                    Val1 = wsa.cell(ro1,colum).value
                    Val2 = wsa.cell(ro2,colum).value
                    Val3 = wsa.cell(ro3,colum).value

                    if Val == None or Val == '':
                        Val = "Null"
                        val_a  = "Null"
                    if Val1 == None or Val1 == '':
                        Val1 = "Null"
                        val_b  = "Null"
                    if Val2 == None or Val2 == '':
                        Val2 = "Null"
                        val_c  = "Null"
                    if Val3 == None or Val3 == '':
                        Val3 = "Null"
                        val_d  = "Null"

                    if Val != None and Val != "Null":
                        val_a = Val.split("_")[1]
                        Val = Val.split("_")[0]
                    if Val1 != None and Val1 != "Null":
                        val_b = Val1.split("_")[1]
                        Val1 = Val1.split("_")[0]
                    if Val2 != None and Val2 != "Null":
                        val_c = Val2.split("_")[1]
                        Val2 = Val2.split("_")[0]
                    if Val3 != None and Val3 != "Null":
                        val_d = Val3.split("_")[1]
                        Val3 = Val3.split("_")[0]

                    if "STA ATT" in val_a and Val != "Null" and In_St_At!= 1:
                        wsa.cell(11,colum).value = Val
                        In_St_At= 1
                    if "STA ATT" in val_b and Val1 != "Null" and In_St_At!= 1:
                        wsa.cell(11,colum).value = Val1
                        In_St_At= 1
                    if "STA ATT" in val_c and Val2 != "Null" and In_St_At!= 1:
                        wsa.cell(11,colum).value = Val2
                        In_St_At= 1
                    if "STA ATT" in val_d and Val3 != "Null" and In_St_At!= 1:
                        wsa.cell(11,colum).value = Val3
                        In_St_At= 1
                    if "TEC" in val_a and Val != "Null" and In_Te!= 1:
                        wsa.cell(12,colum).value = Val
                        In_Te= 1
                    if "TEC" in val_b and Val1 != "Null" and In_Te!= 1:
                        wsa.cell(12,colum).value = Val1
                        In_Te= 1
                    if "TEC" in val_c and Val2 != "Null" and In_Te!= 1:
                        wsa.cell(12,colum).value = Val2
                        In_Te= 1
                    if "TEC" in val_d and Val3 != "Null" and In_Te!= 1:
                        wsa.cell(12,colum).value = Val3
                        In_Te= 1
                    if "A SPV" in val_a and Val != "Null" and In_App_SPV != 1:
                        wsa.cell(13,colum).value = Val
                        In_App_SPV = 1
                    if "A SPV" in val_b and Val1 != "Null" and In_App_SPV != 1:
                        wsa.cell(13,colum).value = Val1
                        In_App_SPV = 1
                    if "A SPV" in val_c and Val2 != "Null" and In_App_SPV != 1:
                        wsa.cell(13,colum).value = Val2
                        In_App_SPV = 1
                    if "A SPV" in val_d and Val3 != "Null" and In_App_SPV != 1:
                        wsa.cell(13,colum).value = Val3
                        In_App_SPV = 1
                    if "D C P" in val_a and Val != "Null" and In_Da_Ci_Pr != 1:
                        wsa.cell(14,colum).value = Val
                        In_Da_Ci_Pr = 1
                    if "D C P" in val_b and Val1 != "Null" and In_Da_Ci_Pr != 1:
                        wsa.cell(14,colum).value = Val1
                        In_Da_Ci_Pr = 1
                    if "D C P" in val_c and Val2 != "Null" and In_Da_Ci_Pr != 1:
                        wsa.cell(14,colum).value = Val2
                        In_Da_Ci_Pr = 1
                    if "D C P" in val_d and Val3 != "Null" and In_Da_Ci_Pr != 1:
                        wsa.cell(14,colum).value = Val3
                        In_Da_Ci_Pr = 1

使用 pandas 或更快的方法提高性能

python pandas dataframe openpyxl xlsx
1个回答
0
投票

IIUC,提取每个系列的后缀后,您可以为每列使用

merge

out = df.copy()

ref = df.iloc[10:, 1].reset_index(name='ref')

out.iloc[10:, 2:] = (
    df.iloc[14:, 2:]
      .apply(lambda s: ref
             .merge(s.rename('out'), how='left', left_on='ref',
                    right_on=s.str.extract(r'([^.]+)$', expand=False))
             .set_index('index')['out'])
)

请注意,就像您的描述中一样,要使用的窗口的位置是硬编码的(第 2 列之后、第 10 行之后、第 14 行之后)。

输出:

     0                         1                                2                              3                                4                                5                              6
0    0                Unnamed: 0                              095                            088                              088                              713                            714
1    1                   APP SPV                              NaN                            NaN                              NaN                              NaN                            NaN
2    2                       DAT                              NaN                     11/04/2024                       11/04/2024                              NaN                            NaN
3    3                     ST AT                               AN                            ABA                              ABA                              NaN                             PP
4    4                       TEC                              TOP                            NaN                              NaN                              CAS                            NaN
5    5                      DATE                              NaN                            NaN                              NaN                              NaN                            NaN
6    6                       NaN                              NaN                            NaN                              NaN                              NaN                            NaN
7    7                       NaN                              NaN                            NaN                              NaN                              NaN                            NaN
8    8                       NaN                              NaN                            NaN                              NaN                              NaN                            NaN
9    9                       NaN                              NaN                            NaN                              NaN                              NaN                            NaN
10  10                   STA ATT  X XX X_05.04.2024 16:40.STA ATT                            NaN  X XX X_05.04.2024 18:37.STA ATT  X XX X_05.04.2024 16:40.STA ATT                            NaN
11  11                       TEC      X XX X_05.04.2024 16:40.TEC    X XX X_05.04.2024 16:40.TEC      X XX X_05.04.2024 18:37.TEC                              NaN                            NaN
12  12                     A SPV                              NaN  X XX X_05.04.2024 16:40.A SPV    X XX X_05.04.2024 18:37.A SPV                              NaN  X XX X_06.04.2024 10:11.A SPV
13  13                     D C P                              NaN  X XX X_05.04.2024 16:40.D C P                              NaN                              NaN                            NaN
14  14    A SPV_05.04.2024 16:40                              NaN                            NaN                              NaN                              NaN                            NaN
15  15    D C P_05.04.2024 16:40                              NaN                            NaN                              NaN                              NaN                            NaN
16  16  STA ATT_05.04.2024 16:40                              NaN                            NaN                              NaN                              NaN                            NaN
17  17      TEC_05.04.2024 16:40                              NaN                            NaN                              NaN                              NaN                            NaN
18  18    A SPV_05.04.2024 18:37                              NaN                            NaN                              NaN                              NaN                            NaN
19  19  STA ATT_05.04.2024 18:37                              NaN                            NaN                              NaN                              NaN                            NaN
20  20      TEC_05.04.2024 18:37                              NaN                            NaN                              NaN                              NaN                            NaN
21  21    A SPV_06.04.2024 10:11                              NaN                            NaN                              NaN                              NaN                            NaN
© www.soinside.com 2019 - 2024. All rights reserved.