如何使用Python从Excel文件中提取单元格格式（粗体、斜体等）？

Question

我正在尝试从 Excel 文件中提取单元格的内容（本质上是文本）和文本的格式。我正在处理的 Excel 看起来像下面的屏幕截图：

单元格中的文本可能是粗体、斜体或笔画，我需要将文本和格式提取到 python 字符串中。例如，如果单元格包含：

1号线

2号线

第3行

我想获得一个看起来像这样的Python字符串：

- line 1 /n- **line 2**/n- *line 3**

为了获取文本和格式信息。

我尝试使用openpyxl找到解决方案，但似乎只能应用而不能提取格式单元格。库 xlrd 似乎不适合 xlsx。我目前正在尝试使用 pyexcel 库。

你有什么想法吗？谢谢。

Answer 1

您可以从 openpyxl 导入 Font 并使用

cell.font.bold

检查单元格是否以粗体书写，它给出 True 或 False。

cell=sheet[A2]
bold_status=cell.font.bold
italic_status=cell.font.italic

有关 openpyxl 中字体的更多信息：http://openpyxl.readthedocs.io/en/2.5/api/openpyxl.styles.fonts.html

Answer 2

使用 openpyxl，如果在加载工作簿时将

rich_text

标志设置为

True

，则可以提取整个单元格的样式（如前面的答案所示）和每个单元格中的部分文本。

这是 python 3.10 中的一个示例，打印出所有斜体文本：

from openpyxl import load_workbook
from openpyxl.cell.rich_text import CellRichText, TextBlock


# The 'rich_text=True' parameter is required otherwise the cells are 
workbook = load_workbook('trash/test.xlsx', rich_text=True)

# Assume you're working with the first sheet
sheet = workbook.active

for row in sheet.iter_rows():
    for cell in row:
        # Check if the entire cell is italicized
        if cell.font.italic:
        print(f"Cell {cell.coordinate} is completely italicized: {cell.value}")

        # cell.value will either be CellRichText or str, with CellRichText having more formatting that needs to be checked.
        if isinstance(cell.value, CellRichText):
            for text_block in cell.value:
                # Ensure it's a text block not a plain string, and that it is in fact italicized
                if isinstance(text_block, TextBlock) and text_block.font.italic:
                    print(f"Cell {cell.coordinate} contains italicized text: {text_block.text}")

workbook.close()

此示例将打印出所有斜体文本以及该文本所在的单元格。对于完全斜体的单元格，它将打印出该单元格的单个条目，对于散布格式的单元格，它将打印出每个块的多个条目斜体文本。这意味着如果一个单元格完全斜体并且其中有一些其他文本格式，它将打印出多个带有一些警告的条目。

如果单元格 A1 中有“this is an example”，您将从脚本中得到以下输出：

Cell A1 is completely italicized: this is an example
Cell A1 contains italicized text: an
Cell A1 contains italicized text:  example

1 代表整个单元格
“这是”似乎被跳过了
1 表示粗体斜体“an”
1 表示斜体“示例”

如何使用Python从Excel文件中提取单元格格式（粗体、斜体等）？

问题描述投票：0回答：2

2个回答

最新问题

如何使用Python从Excel文件中提取单元格格式（粗体、斜体等）？

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2