如何在docx文档中创建的表中提取文本数据

问题描述 投票:0回答:1

[我想从docx文档中提取文本,我想出了一个从docx文档中提取文本的脚本,但是我注意到某些文档具有表,并且该脚本无法在其上运行,如何改善上述脚本:


import glob
import os

import docx

with open('your_file.txt', 'w') as f:
    for directory in glob.glob('fi*'):
        for filename in glob.glob(os.path.join(directory, "*")):
            if filename.endswith((".docx", ".doc")):
                document = docx.Document(filename)    
                for paragraph in document.paragraphs:
                    if paragraph.text:
                        #docText.append(paragraph.text)
                        f.write("%s\n" % paragraph.text)


带有表的docx

enter image description here

python python-3.x docx extraction python-docx
1个回答
0
投票

尝试改用python-docx模块pip install python-docx

import docx

doc = docx.Document("document.docx")

for table in doc.tables:
    for i, row in enumerate(table.rows):
        for cell in row.cells:
            print(cell.text)
© www.soinside.com 2019 - 2024. All rights reserved.