从目录读取时出现 pd.read_excel 错误

问题描述 投票:0回答:1

你好,我是Python初学者。

我试图使用 pd.read_excel 从一个文件夹中读取一些 excel 文件 (.xlsx)。

`import numpy as np
`import pandas as pd
`import os`
`import torch.nn as nn
`import matplotlib.pyplot as plt
`def get_data(data_path):
`    print(f"Reading data from {data_path}")
`    data = pd.read_excel(data_path, engine='openpyxl')
`    num_rows, num_columns = data.shape
`    print("Original data:")
`    print(data.head(100))  # 输出数据框的前几行
`    print(f"Number of rows:{num_rows}")
``    print(f"Number of columns:{num_columns}")
``    print("Number of rows in data:", len(data))
``    return data
`
`data_path = r'E:\SVN_Pengfei\5.2020年工作\BaiduNetdiskWorkspace\1.ATM项目\4.技术文档\高速数据资料\北五环数据\雷达数据'
`predict_values_name = ' Speed Value'
`data=get_data(data_path)

但总是显示错误:

 Reading data from E:\SVN_Pengfei\5.2020年工作\BaiduNetdiskWorkspace\1.ATM项目\4.技术文档\高速数据资料\北五环数据\雷达数据
Traceback (most recent call last):
  File "e:/SVN_Pengfei/5.2020年工作/BaiduNetdiskWorkspace/1.ATM项目/2.相关软件/模型、代码及说明文档/LSTM预测流量/test1.py", line 19, in <module>
    data=get_data(data_path)
  File "e:/SVN_Pengfei/5.2020年工作/BaiduNetdiskWorkspace/1.ATM项目/2.相关软件/模型、代码及说明文档/LSTM预测流量/test1.py", line 8, in get_data
    data = pd.read_excel(data_path, engine='openpyxl')
  File "D:\Python37\lib\site-packages\pandas\util\_decorators.py", line 296, in wrapper
    return func(*args, **kwargs)
  File "D:\Python37\lib\site-packages\pandas\io\excel\_base.py", line 304, in read_excel
    io = ExcelFile(io, engine=engine)
  File "D:\Python37\lib\site-packages\pandas\io\excel\_base.py", line 867, in __init__
    self._reader = self._engines[engine](self._io)
  File "D:\Python37\lib\site-packages\pandas\io\excel\_openpyxl.py", line 480, in __init__
    super().__init__(filepath_or_buffer)
  File "D:\Python37\lib\site-packages\pandas\io\excel\_base.py", line 353, in __init__
    self.book = self.load_workbook(filepath_or_buffer)
  File "D:\Python37\lib\site-packages\pandas\io\excel\_openpyxl.py", line 492, in load_workbook
    filepath_or_buffer, read_only=True, data_only=True, keep_links=False
  File "D:\Python37\lib\site-packages\openpyxl\reader\excel.py", line 345, in load_workbook
    data_only, keep_links, rich_text)
  File "D:\Python37\lib\site-packages\openpyxl\reader\excel.py", line 123, in __init__
    self.archive = _validate_archive(fn)
  File "D:\Python37\lib\site-packages\openpyxl\reader\excel.py", line 93, in _validate_archive
    raise InvalidFileException(msg)
openpyxl.utils.exceptions.InvalidFileException: openpyxl does not support  file format, please check you can open it with Excel first. Supported formats are: .xlsx,.xlsm,.xltx,.xltm

我可以将 data_path 更改为一个特定文件,例如 data_path = r'E:\SVN_鹏飞.2020年工作\百度网盘工作空间.ATM项目.技术文档\高速数据资料\北五环数据\雷达数据bcde.xlsx'

它工作得很好,但我不想这样做,我希望它读取文件夹,我将有另一个模块来处理多个文件。

熊猫版本:1.1.5 python版本:3.7.0

pandas python-3.7
1个回答
0
投票

删除

engine='openpyxl'

data = pd.read_excel(data_path)
© www.soinside.com 2019 - 2024. All rights reserved.