使用 python-docx 从 word 文档中的文本框中提取文本的问题

Question

我想使用 python-docx 提取文本框中的文本。我试过段落和表格属性，但无法提取文本。请让我知道是否可以使用 python-docx 提取文本，或者我应该尝试使用其他库。

这是文件的链接： https://docs.google.com/document/d/1_metGi8aqPn8XciyjpbONOq4XdTNeuDD/edit?usp=share_link&ouid=104132986734597597850&rtpof=true&sd=true

这里是使用段落属性的代码片段：

!pip install python-docx
import docx

source_file = 'textbox.docx'
doc = docx.Document(source_file)

text = [];
for para in doc.paragraphs:
    text.append(para.text)
text

这里是使用 tables 属性的代码片段：

!pip install python-docx
import docx

source_file = 'textbox.docx'
doc = docx.Document(source_file)

text = [];
for table in doc.tables:
  for row in table.rows:
    for cell in row.cells:
      text.append(cell.text)
text

结果是一样的（没有提取文本）：

[]

Answer 1

不支持带有 python-docx 的文本框。
但是你可以用 win32com 提取

您会在 Doc 中查找形状，然后检查

type

是否有

Textbox

，它的数字是 17。然后只需根据代码的第一部分打印出文本。但是，在您的单词示例中，文本框已分组，因此在这种情况下，形状对象的数字为 6，我们需要遍历项目组以再次找到文本框，键入 17，然后我们可以提取文本。

代码示例

import win32com.client as win32

word = win32.gencache.EnsureDispatch('Word.Application')
source_file = 'textbox.docx'
doc = word.Documents.Open(source_file)

for sh in doc.Shapes:
    print(sh.Type)  # Type 17 is a textbox
    if sh.Type == 17:
        print(sh.Name)
        print(sh.TextFrame.TextRange.Text)
    elif sh.Type == 6:  # 6 is a group
        for grp in sh.GroupItems:
            print(grp.Name)
            print(grp.Type)
            if grp.Type == 17:
                t = grp.TextFrame.TextRange.Text
                print(t.replace('\r', ''))

输出为；（此显示中的文本被截断）

Power Case Study: Solar PanelsMany households in Florida have adopted solar panels as a source of energy. There have been a total of 155,383 solar panel installations in Florida (
Yu et al. 2018), out of which 90.7% (140,265) are residential installations, and 9.3% (15118) are commercial installations. Over 90% of the solar panels have been installed since 
2017.Solar panels can provide access to continuous power supply when a hurricane damages the primary grid, as exemplified in Hurricane Ian by Babcock Ranch, a community located 12
 miles northeast of Fort Myers. 
...

使用 python-docx 从 word 文档中的文本框中提取文本的问题

问题描述投票：0回答：1

1个回答

最新问题

使用 python-docx 从 word 文档中的文本框中提取文本的问题

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1