read excel表格包含多个表,这些表的标题具有非白色背景单元格颜色

问题描述 投票:0回答:1

我有一个excelsheet,同一张工作表上有多个表。这些表具有不同的列号和不同的行号。不过,好消息是表标题具有背景色,并且表内容具有白色背景。

我想知道是否可以使用xlrd或其他软件包将这些表中的每个数据读取为单独的数据帧。

目前正在考虑的方法相当冗长,可能并不理想。

例如:

import xlrd
book = xlrd.open_workbook("some.xls", formatting_info=True)
sheets = book.sheet_names()
for index, sh in enumerate(sheets):
    sheet = book.sheet_by_index(index)
    rows, cols = sheet.nrows, sheet.ncols
    for row in range(rows):
         for col in range(cols):
             xfx = sheet.cell_xf_index(row, col)
             xf = book.xf_list[xfx]
             bgx = xf.background.pattern_colour_index
             if bgx != 64:
                 Header_row = rownum

然后遍历此Header_row并获取所有列值,并将它们作为数据框列名。然后继续解析第一列的行,直到遇到空白单元格或只有一个或两个非空单元格的行。

如您所见,这变得有点冗长,可能不是最佳方法。

感谢您的帮助,如何快速将所有故事作为单独的数据框提取出来。enter image description here

python python-3.x xlrd
1个回答
0
投票
大概是这样:

import xlrd # from typing import Dict book = xlrd.open_workbook("some.xls", formatting_info=True) def is_header(sheet, row, col, exclude_color=64): xf_index = sheet.cell_xf_index(row, col) bg_color = book.xf_list[xf_index].background.pattern_colour_index return bg_color != 64 def parse_sheet(sheet): """Parse a sheet and retrieve data as dataframe""" column_headers = dict() # type: Dict[int, str] for row in range(sheet.nrows): # We skip rows if first cell is not a header and has no value # TODO: Remove if that skips line 13 as well if not is_header(sheet, row, 0) and not sheet.cell_value(row, 0): column_headers.clear() continue # Otherwise, we will populate the list of headers for column # And we will parse other data c_headers = [c for c in range(sheet.ncols) if is_header(sheet, row, c)] if c_headers: for col in c_headers: column_headers[col] = sheet.cell_value(row, col) else: for col in range(sheet.ncols): value = sheet.cell_value(row, col) # TODO: Add data in the dataframe and use column headers for index in range(book.sheet_names()): parse_sheet(book.sheet_by_index(index))

© www.soinside.com 2019 - 2024. All rights reserved.