如何使用python 3打开xlsx文件

Question

我有一个包含 1 页的 xlsx 文件。我尝试使用 python 3 (xlrd lib) 打开它，但我得到一个空文件！

我使用这个代码：

file_errors_location = "C:\\Users\\atheelm\\Documents\\python excel mission\\errors1.xlsx"
workbook_errors = xlrd.open_workbook(file_errors_location)

我没有错误，但是当我输入时：

workbook_errors.nsheets

我得到“0”，即使文件有一些工作表......当我输入：

workbook_errors

我得到：

xlrd.book.Book object at 0x2..

有什么帮助吗？谢谢

Answer 1

你可以像

pandas.read_excel

一样使用Pandas

pandas.read_csv

:

import pandas as pd
file_errors_location = 'C:\\Users\\atheelm\\Documents\\python excel mission\\errors1.xlsx'
df = pd.read_excel(file_errors_location)
print(df)

Answer 2

有两个用于读取 xls 文件的模块：openpyxl 和 xlrd

此脚本允许您使用 xlrd 将 Excel 数据转换为字典列表

import xlrd

workbook = xlrd.open_workbook('C:\\Users\\atheelm\\Documents\\python excel mission\\errors1.xlsx')
workbook = xlrd.open_workbook('C:\\Users\\atheelm\\Documents\\python excel mission\\errors1.xlsx', on_demand = True)
worksheet = workbook.sheet_by_index(0)
first_row = [] # The row where we stock the name of the column
for col in range(worksheet.ncols):
    first_row.append( worksheet.cell_value(0,col) )
# tronsform the workbook to a list of dictionnary
data =[]
for row in range(1, worksheet.nrows):
    elm = {}
    for col in range(worksheet.ncols):
        elm[first_row[col]]=worksheet.cell_value(row,col)
    data.append(elm)
print data

Answer 3

不幸的是，读取 Excel 文档所需的 python 引擎“xlrd”已明确删除对 xls 文件以外的任何内容的支持。

所以您现在可以这样做 -

安装openpyxl：

https://openpyxl.readthedocs.io/en/stable/

将您的 pandas 代码更改为：

pandas.read_excel('cat.xlsx', engine='openpyxl')

注意：这对我来说适用于最新版本的 Pandas（即 1.1.5）。之前，我使用的是 0.24.0 版本，但它不起作用，所以我必须更新到最新版本。

Answer 4

另一种方法：

import openpyxl 
workbook_errors = openpyxl.Workbook()
workbook_errors = openpyxl.load_workbook(file_errors_location)

Answer 5

对于那些不想使用 openpyxl 的人，因为：

openpyxl 运行速度超慢；
发现openpyxl太复杂；
只想将工作表加载为数据框；

推荐使用

xlwings（通过pip install xlwings安装），下面是示例：

import xlwings as xw
import pandas as pd

wb_path = ".\\***.xlsx" # the .xlsx file is in the same directory as the .py script
ws_name = "***" # the name of the worksheet to load data

wb = xw.Book(wb_path)
ws = wb.sheets[ws_name]

MAX_ROW = 100 # the rows to read, this can be larger, the empty rows can be dropped in Pandas dataframe
MAX_COL = 40 # the columns to read

data = ws[:MAX_ROW,:MAX_COL].value
df = pd.DataFrame(data)

通过上面的代码，工作表可以比 openpyxl 更快地加载到 Pandas 数据框中。

如何使用python 3打开xlsx文件

问题描述投票：0回答：5

5个回答

最新问题

如何使用python 3打开xlsx文件

问题描述 投票：0回答：5

5个回答

最新问题

问题描述投票：0回答：5