获取csv文件的标头

Question

我有3000个Excel文件。我想获取每个文件的标题并将其存储为csv。但是，我遇到了一个解析错误：

 'utf-8' codec can't decode byte 0xfa in position 1: invalid start byte

我已经看过这篇文章了。它没有解决问题：UnicodeDecodeError: 'charmap' codec can't decode byte X in position Y: character maps to <undefined>

import glob
import pandas as pd

all_files = glob.glob("Converted Excels/*.xlsx")
file = all_files[0]

#Try 1
columns = []
with open(file, "r") as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    for row in csv_reader:
        columns.append([row])
        break

#Try 2
df = pd.read_csv(file, header=0, nrows=1)
df

这是一个示例文件。 https://docs.google.com/spreadsheets/d/194QD14g_L0NQK6j3yO2Et2ZzycfQDzJXu7vdlr20owA/edit?usp=sharing

我从PDF转换为Excel。但在转换过程中，我指定了encoding =“utf8”。

如何从此文件中获取标题？

非常感谢你的帮助。

Answer 1

.xlsx不是CSV文件。你不能使用pandas.read_csv()或模块csv来阅读.xlsx。

对excel文件使用pandas.read_excel()或模块。见：www.python-excel.org

据我所知，.xlsx是包含XML文件的ZIP文件 - 因此您也可以尝试解压缩并读取xml。

获取csv文件的标头

问题描述投票：1回答：1

1个回答

最新问题

获取csv文件的标头

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1