使用 python-docx 从 word 文档中的文本框中提取文本的问题

问题描述 投票:0回答:1

我想使用 python-docx 提取文本框中的文本。我试过段落和表格属性,但无法提取文本。请让我知道是否可以使用 python-docx 提取文本,或者我应该尝试使用其他库。

这是文件的链接: https://docs.google.com/document/d/1_metGi8aqPn8XciyjpbONOq4XdTNeuDD/edit?usp=share_link&ouid=104132986734597597850&rtpof=true&sd=true

这里是使用段落属性的代码片段:

!pip install python-docx
import docx

source_file = 'textbox.docx'
doc = docx.Document(source_file)

text = [];
for para in doc.paragraphs:
    text.append(para.text)
text

这里是使用 tables 属性的代码片段:

!pip install python-docx
import docx

source_file = 'textbox.docx'
doc = docx.Document(source_file)

text = [];
for table in doc.tables:
  for row in table.rows:
    for cell in row.cells:
      text.append(cell.text)
text

结果是一样的(没有提取文本):

[]
python textbox python-docx
1个回答
1
投票

不支持带有 python-docx 的文本框。
但是你可以用 win32com 提取

您会在 Doc 中查找形状,然后检查

type
是否有
Textbox
,它的数字是 17。然后只需根据代码的第一部分打印出文本。但是,在您的单词示例中,文本框已分组,因此在这种情况下,形状对象的数字为 6,我们需要遍历项目组以再次找到文本框,键入 17,然后我们可以提取文本。

代码示例

import win32com.client as win32

word = win32.gencache.EnsureDispatch('Word.Application')
source_file = 'textbox.docx'
doc = word.Documents.Open(source_file)

for sh in doc.Shapes:
    print(sh.Type)  # Type 17 is a textbox
    if sh.Type == 17:
        print(sh.Name)
        print(sh.TextFrame.TextRange.Text)
    elif sh.Type == 6:  # 6 is a group
        for grp in sh.GroupItems:
            print(grp.Name)
            print(grp.Type)
            if grp.Type == 17:
                t = grp.TextFrame.TextRange.Text
                print(t.replace('\r', ''))

输出为; (此显示中的文本被截断)

Power Case Study: Solar PanelsMany households in Florida have adopted solar panels as a source of energy. There have been a total of 155,383 solar panel installations in Florida (
Yu et al. 2018), out of which 90.7% (140,265) are residential installations, and 9.3% (15118) are commercial installations. Over 90% of the solar panels have been installed since 
2017.Solar panels can provide access to continuous power supply when a hurricane damages the primary grid, as exemplified in Hurricane Ian by Babcock Ranch, a community located 12
 miles northeast of Fort Myers. 
...
© www.soinside.com 2019 - 2024. All rights reserved.