如何在Python中读取doc文件?一些库在读取文件时显示错误。有时我们需要首先将 doc 文件转换为 docx,然后使用一些支持此约定的库。
我发现很多文章和博客都在谈论“docx”文件,但只有其中一些在谈论“doc”文件。 从这个答案中找到了一个基本想法并想与大家分享基本想法。下面的代码读取“doc”文件并将行写入以“分隔的Excel文件中” ”.
import win32com.client # to work with doc file
import os # to find the absolute path
import xlsxwriter # For write in excel file
word = win32com.client.Dispatch("Word.Application")
word.visible = False
full_path = os.path.abspath("Test.doc")
wb = word.Documents.Open(full_path)
docs = word.ActiveDocument
docs = docs.Range().Text.split("\r") # reading the text and store the lines into a list
cnt=0
workbook = xlsxwriter.Workbook('Test.xlsx')
worksheet = workbook.add_worksheet("Test")
row = 0
for i in range(0, len(docs)): # computing each line of the word file
if "test" in docs[i]: # checking if this line have any test word
column = 0
'''Writing the line to the excel file'''
worksheet.write(row, column, docs[i])
cnt+=1
row += 1
workbook.close()
print("Number of test is: ",cnt)
word.Quit()