如何使用python读取doc文件并逐行读取文本?

问题描述 投票:0回答:0

如何在Python中读取doc文件?一些库在读取文件时显示错误。有时我们需要首先将 doc 文件转换为 docx,然后使用一些支持此约定的库。

我发现很多文章和博客都在谈论“docx”文件,但只有其中一些在谈论“doc”文件。 从这个答案中找到了一个基本想法并想与大家分享基本想法。下面的代码读取“doc”文件并将行写入以“分隔的Excel文件中” ”.

import win32com.client    # to work with doc file
import os     # to find the absolute path
import xlsxwriter     # For write in excel file

word = win32com.client.Dispatch("Word.Application")
word.visible = False
full_path = os.path.abspath("Test.doc")
wb = word.Documents.Open(full_path)
docs = word.ActiveDocument
docs = docs.Range().Text.split("\r")     # reading the text and store the lines into a list
cnt=0


workbook = xlsxwriter.Workbook('Test.xlsx')
worksheet = workbook.add_worksheet("Test")
row = 0


for i in range(0, len(docs)):    # computing each line of the word file
    if "test" in docs[i]:     # checking if this line have any test word
        column = 0

        '''Writing the line to the excel file'''
        worksheet.write(row, column, docs[i])
        cnt+=1
        row += 1

workbook.close()
print("Number of test is: ",cnt)
word.Quit()
python-3.x readfile doc
© www.soinside.com 2019 - 2024. All rights reserved.