我想在Python中将csv作为字典读取,但现在我遇到了一个问题,因为csv包含多次使用的标题,如下所示:
id | 名字 | 标签 | 标签 |
---|---|---|---|
01 | 一个 | 我的任务 | 我的产品 |
02 | 两个 | 我的标签 |
将 csv 导入 python 的标准方法如下所示:
# import csv
import csv
# read csv file to a list of dictionaries
with open('data.csv', 'r') as file:
csv_reader = csv.DictReader(file)
data = [row for row in csv_reader]
print(data)
遗憾的是,如果有多个“标签”值,则此代码会吞掉第一个“标签”值。此代码输出:
[
{'id': '01', 'name': 'one', 'labels': 'myproduct'},
{'id': '02', 'name': 'two', 'labels': 'mylabel'},
]
有没有办法在不变得复杂的情况下读取“标签”的第二个值?我的首选输出如下所示:
[
{'id': '01', 'name': 'one', 'labels': ['mytask', 'myproduct']},
{'id': '02', 'name': 'two', 'labels': 'mylabel'},
]
您可以定义自己的阅读器,它可以正确处理存在重复标头的文件。
初始化读取器时,解析第一行以将所有标题名称映射到相应的列索引。当您读取行时,迭代此映射,并根据您之前创建的标题名称到列索引映射将值分配给输出字典的正确键。
import csv
class DHDictReader():
def __init__(self, iterable, dialect='excel', **kwargs):
self._reader = csv.reader(iterable, dialect, **kwargs)
# Get header row
headers = next(self._reader)
# Map header names to column indices
self._row_cols = {}
for i, h in enumerate(headers):
if h in self._row_cols:
self._row_cols[h].append(i)
else:
self._row_cols[h] = [i]
# The iterator for this object is itself --
# calling __next__ on this object yields the next record
def __iter__(self):
return self
def __next__(self):
# Get next row
row = next(self._reader)
out_row = {}
for header, col_indices in self._row_cols.items():
# Create list containing all non-empty values for this header
out_row[header] = [row[i] for i in col_indices if i < len(row) and row[i]]
# If this header contains only one value, change it from a list to the first element of the list
if len(out_row[header]) == 1:
out_row[header] = out_row[header][0]
return out_row
@property
def dialect(self):
return self._reader.dialect
@property
def line_num(self):
return self._reader.line_num
@property
def fieldnames(self):
return self._row_cols.keys()
csv
模块中的其他读取器对象也具有的属性
现在,这个类应该是常规类的直接替代品
DictReader
:
file_contents = """id,name,labels,labels
01,one,mytask,myproduct
02,two,mylabel
"""
with open('my_file.csv', 'w') as f:
f.write(file_contents)
with open('my_file.csv') as f:
reader = DHDictReader(f)
print(list(reader))
将以您想要的方式列出记录:
[
{'id': '01', 'name': 'one', 'labels': ['mytask', 'myproduct']},
{'id': '02', 'name': 'two', 'labels': 'mylabel'}
]
加载 csv 后,您可以跳过标题并手动操作数据
import csv
raw ="""id,name,label,label\n1,aaa,a1,a2\n2,bbb,b1,b2\n3,ccc,c1,c2""".splitlines()[1:]
parsed = list(csv.reader(a))
我会这样做
dictionary = {}
for row in range(2, lastRow):
for col in range(1, lastCol):
headline_cell = worksheet.Cells(1, col).Value
cell = worksheet.Cells(row, col).Value
if headline_cell in dictionary:
if isinstance(dictionary[headline_cell], list) == False:
dictionary[headline_cell] = [dictionary[headline_cell]]
else:
dictionary[headline_cell].append(cell)
else:
dictionary[headline_cell] = cell
首先你必须获得整个 csv 最后使用的列和行