想问有关使用selenium
(Python
)在PyCharm
自动化中的问题
我有一个被屏蔽为.xls
文件的测试数据。这是一个以HTML
格式屏蔽的XLS
文件
我已经找到了将其转换为[Link)的代码
这里是代码:
bunch_size = 10000000 # Experiment with different sizes
bunch = []
with open(test_path+final_date+".xls", "r") as r, open(location_of_html, "w") as w:
for line in r:
print(line)
x, y, z, rest = line.split(' ', 3)
bunch.append(' '.join((x[:-3], y[:-3], z[:-3], rest)))
if len(bunch) == bunch_size:
w.writelines(bunch)
bunch = []
w.writelines(bunch)
上面的代码产生的这一行是正确的:
<table style='height: 184px;' width='518'>
<tbody>
<tr>
<td style='text-align: center;'> <img /></td>
<td style='text-align: center;' colspan='3'><span style='font-size: 12pt;'> <strong>Villanueva Enterprise</strong> </span><br /><span style='font-size: 10pt;'> <strong>Payslip</strong> </span></td>
<td> </td>
</tr>
<tr>
<td><span style='font-size: 8pt;'>NAME:</span></td>
但是转换最终产品时,产生的代码是:
<ta style="heig 184p width=" 518'="">
<img>
<span style="font-size: 12pt;"> <strong>Villanueva Enterprise</strong> </span><br><span style="font-size: 10pt;"> <strong>Payslip</strong> </span>
<span style="font-size: 8pt;">NAME:</span>
<span style="font-size: 8pt;">Earner Minimum Wage 620,350</span>
<span style="font-size: 8pt;">PAYROLL DATE: </span>
<span style="font-size: 8pt;">26 March 2021 </span>
有什么想法吗?
设法解决这一问题。我只是删除拆分条件,然后直接转换混合xls文件。
with open(test_path + final_date + ".xls", "r") as r, open(location_of_html, "w") as w:
for line in r:
print(line)
w.write(line)