Python：将文本从HTML或文本文档导入Word

Question

我一直在看一些文档，但是我在docx上看到的所有工作主要是针对在Word文档中已经处理过[[with文本。我想知道的是，有没有一种简单的方法可以从HTML或文本文档中提取文本，并将其导入到Word文档中，然后进行批量批发？ HTML /文本文档中的所有文本？它似乎不喜欢字符串，太长了。

我对文档的理解是，您必须逐段处理文本。我想做的任务相对简单-但是这超出了我的python技能。我想在Word文档上设置页边距，然后将文本导入到Word文档中，以使其符合我先前指定的页边距。
有人有什么想法吗？我发现以前的帖子都没有非常有用。
import textwrap import requests from bs4 import BeautifulSoup from docx import Document from docx.shared import Inches class DocumentWrapper(textwrap.TextWrapper): def wrap(self, text): split_text = text.split('\n\n') lines = [line for para in split_text for line in textwrap.TextWrapper.wrap(self, para)] return lines page = requests.get("http://classics.mit.edu/Aristotle/prior.mb.txt") soup = BeautifulSoup(page.text,"html.parser") #we are going to pull in the text wrap extension that we have added. #The typical width that we want tow text_wrap_extension = DocumentWrapper(width=82,initial_indent="",fix_sentence_endings=True) new_string = text_wrap_extension.fill(page.text) final_document = "Prior_Analytics.txt" with open(final_document, "w") as f: f.writelines(new_string) document = Document(final_document) ### Specified margin specifications sections = document.sections for section in sections: section.top_margin = (Inches(1.00)) section.bottom_margin = (Inches(1.00)) section.right_margin = (Inches(1.00)) section.left_margin = (Inches(1.00)) document.save(final_document)
我抛出的错误如下：
docx.opc.exceptions.PackageNotFoundError: Package not found at 'Prior_Analytics.txt'

Answer 1

此错误仅表示您指定的位置没有.docx文件。因此，您可以修改代码以创建不存在的文件。

final_document = "Prior_Analytics.txt" with open(final_document, "w+") as f: f.writelines(new_string)

您正在提供相对路径。您如何知道Python当前的工作目录是什么？这就是您给出的相对路径的起点。这样的几行代码会告诉您：
import os
print(os.path.realpath('./'))

注意：

docx用于打开.docx文件

Answer 2

我明白了。

document = Document() sections = document.sections

Python：将文本从HTML或文本文档导入Word

问题描述投票：0回答：2

2个回答

最新问题

Python：将文本从HTML或文本文档导入Word

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2