使用xlsxwriter将表从Word(.docx)写入Excel(.xlsx)

问题描述 投票:1回答:2

我正在尝试解析表的单词(.docx),然后使用xlsxwriter将这些表复制到excel。这是我的代码:

from docx.api import Document
import xlsxwriter

document = Document('/Users/xxx/Documents/xxx/Clauses Sample - Copy v1 - for merge.docx')
tables = document.tables

wb = xlsxwriter.Workbook('C:/Users/xxx/Documents/xxx/test clause retrieval.xlsx')
Sheet1 = wb.add_worksheet("Compliance")
index_row = 0

print(len(tables))

for table in document.tables:
data = []
keys = None
for i, row in enumerate(table.rows):
    text = (cell.text for cell in row.cells)

    if i == 0:
        keys = tuple(text)
        continue
    row_data = dict(zip(keys, text))
    data.append(row_data)
    #print (data)
    #big_data.append(data)
    Sheet1.write(index_row,0, str(row_data))      
    index_row = index_row + 1

print(row_data)

wb.close()

这是我想要的输出:

enter image description here

但是,这是我的实际输出:

enter image description here

我知道我的当前输出将产生一个字符串列表。

无论如何,我可以使用xlsxwriter获得所需的输出吗?非常感谢您的帮助

python python-3.x xlsxwriter python-docx
2个回答
1
投票

我将使用pandas包而不是pandas,如下所示:

xlsxwriter

输出在excel中插入的以下内容:

from docx.api import Document
import pandas as pd

document = Document("D:/tmp/test.docx")
tables = document.tables
df = pd.DataFrame()

for table in document.tables:
    for row in table.rows:
        text = [cell.text for cell in row.cells]
        df = df.append([text], ignore_index=True)

df.columns = ["Column1", "Column2"]    
df.to_excel("D:/tmp/test.xlsx")
print df

0
投票

这是我的代码更新的一部分,它允许我获取所需的输出:

>>> 
  Column1 Column2
0   Hello    TEST
1     Est    Ting
2      Gg      ff

输出

for row in block.rows: for x, cell in enumerate(row.cells): print(cell.text) Sheet1.write(index_row, x, cell.text) index_row += 1

© www.soinside.com 2019 - 2024. All rights reserved.