我有以下输入表,我想获取输出表。
为此,我必须检查从第 14 行到文件末尾的每个单元格内容的最后部分,如果有匹配项,则将整个单元格值移动到第 10、11、12、13 行中(“STA ATT”、“TEC”、“D C P”、“A SPV”) 我必须这样做直到最后一栏
我编写了一个代码来做到这一点,它有效,但它非常混乱、低效且缓慢。我用 openpyxl 编写了它,但我认为这是最慢的方法,并且我认为 pandas 提供了最快的方法。
在下面的例子中: 对于 095 列,我必须将第 16 行中的内容移动到单元格 10,将第 17 行中的内容移动到单元格 11 对于第一列 088,我必须将第 14 行中的内容移动到单元格 12、15 到单元格 13、17 到单元格 11 对于第二列 088,我必须将第 18 行中的内容移动到单元格 12、19 到单元格 10、16 到单元格 10、20 到单元格 11 等等..
从 ROW 14 开始,每列中只有 4 个值需要移动到相应的 ROW 中 这些值不一定是连续的,但从第一个值(可以是空单元格)开始,其余值位于接下来的 3 个单元格中。 例如,在第 095 列中,我将仅移动 2 个值,X XX X_05.04.2024 16:40.STA ATT 和 X XX X_05.04.2024 16:40.TEC,因为前两个单元格为空
你能帮助我提高性能吗? 使用我编写的代码,我需要 15 分钟才能完成 df 为 482 列和 8k 行的任务
#输入
0 | Unnamed: 0 | 095 | 088 | 088 | 713 | 714 |
1 | APP SPV | | | | | |
2 | DAT | | 11/04/2024 | 11/04/2024 | | |
3 | ST AT | AN | ABA | ABA | | PP |
4 | TEC | TOP | | | CAS | |
5 | DATE | | | | | |
6 | | | | | | |
7 | | | | | | |
8 | | | | | | |
9 | | | | | | |
10 | STA ATT | | | | | |
11 | TEC | | | | | |
12 | A SPV | | | | | |
13 | D C P | | | | | |
14 | A SPV_05.04.2024 16:40 | | X XX X_05.04.2024 16:40.A SPV | | | |
15 | D C P_05.04.2024 16:40 | | X XX X_05.04.2024 16:40.D C P | | | |
16 | STA ATT_05.04.2024 16:40 | X XX X_05.04.2024 16:40.STA ATT | | | X XX X_05.04.2024 16:40.STA ATT | |
17 | TEC_05.04.2024 16:40 | X XX X_05.04.2024 16:40.TEC | X XX X_05.04.2024 16:40.TEC | | | |
18 | A SPV_05.04.2024 18:37 | | | X XX X_05.04.2024 18:37.A SPV | | |
19 | STA ATT_05.04.2024 18:37 | | | X XX X_05.04.2024 18:37.STA ATT | | |
20 | TEC_05.04.2024 18:37 | | | X XX X_05.04.2024 18:37.TEC | | |
21 | A SPV_06.04.2024 10:11 | | | | | X XX X_06.04.2024 10:11.A SPV |
#输出
0 | Unnamed: 0 | 095 | 088 | 088 | 713 | 714 |
1 | APP SPV | | | | | |
2 | DAT | | 11/04/2024 | 11/04/2024 | | |
3 | ST AT | AN | ABA | ABA | | PP |
4 | TEC | TOP | | | CAS | |
5 | DATE | | | | | |
6 | | | | | | |
7 | | | | | | |
8 | | | | | | |
9 | | | | | | |
10 | STA ATT | X XX X_05.04.2024 16:40.STA ATT | | X XX X_05.04.2024 18:37.STA ATT | X XX X_05.04.2024 16:40.STA ATT | |
11 | TEC | X XX X_05.04.2024 16:40.TEC | X XX X_05.04.2024 16:40.TEC | X XX X_05.04.2024 18:37.TEC | | |
12 | A SPV | | X XX X_05.04.2024 16:40.A SPV | X XX X_05.04.2024 18:37.A SPV | |X XX X_06.04.2024 10:11.A SPV |
13 | D C P | | X XX X_05.04.2024 16:40.D C P | | | |
14 | A SPV_05.04.2024 16:40 | | | | | |
15 | D C P_05.04.2024 16:40 | | | | | |
16 | STA ATT_05.04.2024 16:40 | | | | | |
17 | TEC_05.04.2024 16:40 | | | | | |
18 | A SPV_05.04.2024 18:37 | | | | | |
19 | STA ATT_05.04.2024 18:37 | | | | | |
20 | TEC_05.04.2024 18:37 | | | | | |
21 | A SPV_06.04.2024 10:11 | | | | | |
#代码
def find_between_r( s, first, last ):
try:
start = s.rindex( first ) + len( first )
end = s.rindex( last, start )
return s[start:end]
except ValueError:
return ""
cont = 0
val_a = ""
val_b = ""
val_c = ""
val_d = ""
val_cel_2 = ""
for colum in range(2,wsa.max_column+1):
cont = 0
Ul_pa_0 = ""
val_cel_2 = ""
for ro in range(15,wsa.max_row+1):
val_cel_2 = ""
In_St_At= 0
In_Te= 0
In_App_SPV = 0
In_Da_Ci_Pr = 0
val_cel_2 = str(wsa.cell(ro,colum).value)
if val_cel_2 != None:
if "_" in val_cel_2:
Ul_pa_0 = find_between_r(val_cel_2, "_",".")
wsa.cell(6,colum).value = Ul_pa_0
if cont < 1:
cont = cont +1
ro = ro
ro1= ro +1
ro2 = ro +2
ro3 = ro +3
Val = wsa.cell(ro,colum).value
Val1 = wsa.cell(ro1,colum).value
Val2 = wsa.cell(ro2,colum).value
Val3 = wsa.cell(ro3,colum).value
if Val == None or Val == '':
Val = "Null"
val_a = "Null"
if Val1 == None or Val1 == '':
Val1 = "Null"
val_b = "Null"
if Val2 == None or Val2 == '':
Val2 = "Null"
val_c = "Null"
if Val3 == None or Val3 == '':
Val3 = "Null"
val_d = "Null"
if Val != None and Val != "Null":
val_a = Val.split("_")[1]
Val = Val.split("_")[0]
if Val1 != None and Val1 != "Null":
val_b = Val1.split("_")[1]
Val1 = Val1.split("_")[0]
if Val2 != None and Val2 != "Null":
val_c = Val2.split("_")[1]
Val2 = Val2.split("_")[0]
if Val3 != None and Val3 != "Null":
val_d = Val3.split("_")[1]
Val3 = Val3.split("_")[0]
if "STA ATT" in val_a and Val != "Null" and In_St_At!= 1:
wsa.cell(11,colum).value = Val
In_St_At= 1
if "STA ATT" in val_b and Val1 != "Null" and In_St_At!= 1:
wsa.cell(11,colum).value = Val1
In_St_At= 1
if "STA ATT" in val_c and Val2 != "Null" and In_St_At!= 1:
wsa.cell(11,colum).value = Val2
In_St_At= 1
if "STA ATT" in val_d and Val3 != "Null" and In_St_At!= 1:
wsa.cell(11,colum).value = Val3
In_St_At= 1
if "TEC" in val_a and Val != "Null" and In_Te!= 1:
wsa.cell(12,colum).value = Val
In_Te= 1
if "TEC" in val_b and Val1 != "Null" and In_Te!= 1:
wsa.cell(12,colum).value = Val1
In_Te= 1
if "TEC" in val_c and Val2 != "Null" and In_Te!= 1:
wsa.cell(12,colum).value = Val2
In_Te= 1
if "TEC" in val_d and Val3 != "Null" and In_Te!= 1:
wsa.cell(12,colum).value = Val3
In_Te= 1
if "A SPV" in val_a and Val != "Null" and In_App_SPV != 1:
wsa.cell(13,colum).value = Val
In_App_SPV = 1
if "A SPV" in val_b and Val1 != "Null" and In_App_SPV != 1:
wsa.cell(13,colum).value = Val1
In_App_SPV = 1
if "A SPV" in val_c and Val2 != "Null" and In_App_SPV != 1:
wsa.cell(13,colum).value = Val2
In_App_SPV = 1
if "A SPV" in val_d and Val3 != "Null" and In_App_SPV != 1:
wsa.cell(13,colum).value = Val3
In_App_SPV = 1
if "D C P" in val_a and Val != "Null" and In_Da_Ci_Pr != 1:
wsa.cell(14,colum).value = Val
In_Da_Ci_Pr = 1
if "D C P" in val_b and Val1 != "Null" and In_Da_Ci_Pr != 1:
wsa.cell(14,colum).value = Val1
In_Da_Ci_Pr = 1
if "D C P" in val_c and Val2 != "Null" and In_Da_Ci_Pr != 1:
wsa.cell(14,colum).value = Val2
In_Da_Ci_Pr = 1
if "D C P" in val_d and Val3 != "Null" and In_Da_Ci_Pr != 1:
wsa.cell(14,colum).value = Val3
In_Da_Ci_Pr = 1
使用 pandas 或更快的方法提高性能
merge
:
out = df.copy()
ref = df.iloc[10:, 1].reset_index(name='ref')
out.iloc[10:, 2:] = (
df.iloc[14:, 2:]
.apply(lambda s: ref
.merge(s.rename('out'), how='left', left_on='ref',
right_on=s.str.extract(r'([^.]+)$', expand=False))
.set_index('index')['out'])
)
请注意,就像您的描述中一样,要使用的窗口的位置是硬编码的(第 2 列之后、第 10 行之后、第 14 行之后)。
输出:
0 1 2 3 4 5 6
0 0 Unnamed: 0 095 088 088 713 714
1 1 APP SPV NaN NaN NaN NaN NaN
2 2 DAT NaN 11/04/2024 11/04/2024 NaN NaN
3 3 ST AT AN ABA ABA NaN PP
4 4 TEC TOP NaN NaN CAS NaN
5 5 DATE NaN NaN NaN NaN NaN
6 6 NaN NaN NaN NaN NaN NaN
7 7 NaN NaN NaN NaN NaN NaN
8 8 NaN NaN NaN NaN NaN NaN
9 9 NaN NaN NaN NaN NaN NaN
10 10 STA ATT X XX X_05.04.2024 16:40.STA ATT NaN X XX X_05.04.2024 18:37.STA ATT X XX X_05.04.2024 16:40.STA ATT NaN
11 11 TEC X XX X_05.04.2024 16:40.TEC X XX X_05.04.2024 16:40.TEC X XX X_05.04.2024 18:37.TEC NaN NaN
12 12 A SPV NaN X XX X_05.04.2024 16:40.A SPV X XX X_05.04.2024 18:37.A SPV NaN X XX X_06.04.2024 10:11.A SPV
13 13 D C P NaN X XX X_05.04.2024 16:40.D C P NaN NaN NaN
14 14 A SPV_05.04.2024 16:40 NaN NaN NaN NaN NaN
15 15 D C P_05.04.2024 16:40 NaN NaN NaN NaN NaN
16 16 STA ATT_05.04.2024 16:40 NaN NaN NaN NaN NaN
17 17 TEC_05.04.2024 16:40 NaN NaN NaN NaN NaN
18 18 A SPV_05.04.2024 18:37 NaN NaN NaN NaN NaN
19 19 STA ATT_05.04.2024 18:37 NaN NaN NaN NaN NaN
20 20 TEC_05.04.2024 18:37 NaN NaN NaN NaN NaN
21 21 A SPV_06.04.2024 10:11 NaN NaN NaN NaN NaN