通过pandas向每个非空单元格添加子字符串

Question

我有以下 xlsx 表作为输入，我想用 pandas 获取输出。现在我正在使用 openpyxl，但文件有 8k 行和 200 多列，所以我使用的代码效率不高，运行时间超过 20 分钟。示例中所示的文件也有 NaN 和空单元格，我只想修改非空单元格。如代码所示，该代码块必须从第 2 列、第 5 行到文件末尾工作。

# input
            Main_col        0    1    2
0  cas1 1_05.04.2024 16:40  A    B   
1  cas2 5_05.04.2024 16:41       C   
2  cas3 4_05.04.2024 17:30  D         E

# output
            Main_col                    0                          1                        2
0  cas1 1_05.04.2024 16:40  A_05.04.2024 16:40.cas1 1  B_05.04.2024 16:40.cas1 1                    
1  cas2 5_05.04.2024 16:41                             C_05.04.2024 16:41.cas2 5                    
2  cas3 4_05.04.2024 17:30  D_05.04.2024 17:30.cas3 4                            E_05.04.2024 17:30.cas3 4

我使用的代码如下。

for colonn in range(2,ws.max_column+1):
    #print("Elaboro colonna: " + str(colonn))
    for rig in range(5,ws.max_row+1):
        ValoreCell = str(ws.cell(rig,colonn).value)
        Valoreheader = str(ws.cell(rig,1).value)
        if ValoreCell != None and ValoreCell != " ":
            if "_" in ValoreCell:
                Valoreheader = Valoreheader.split("_")[0]
                #print("Valoreheader " + str(Valoreheader))
                Valor = ws.cell(rig,colonn).value
                ws.cell(rig,colonn).value = str(Valor) + "." + str(Valoreheader)

我是 stackoverflow 的新人

使用 pandas 性能得到重大提升

Answer 1

您可以构建掩码并使用布尔索引：

# identify NaN/empty cells
mask = df.fillna('').ne('')
# exclude Main_col from mask
mask['Main_col'] = False

# reorder the "casX YYY" into "YYY.casX"
s = df['Main_col'].str.replace(r'([^ ]+) (.*)', r'\2.\1', regex=True)

# concatenate strings
df[mask] = df.astype(str).add(s, axis=0)

输出：

                  Main_col                         0                         1                         2
0  cas1 1_05.04.2024 16:40  A1_05.04.2024 16:40.cas1  B1_05.04.2024 16:40.cas1                          
1  cas2 5_05.04.2024 16:41                            C5_05.04.2024 16:41.cas2                          
2  cas3 4_05.04.2024 17:30  D4_05.04.2024 17:30.cas3                            E4_05.04.2024 17:30.cas3

通过pandas向每个非空单元格添加子字符串

问题描述投票：0回答：1

1个回答

最新问题

通过pandas向每个非空单元格添加子字符串

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1