我是 python 的新手,我试图创建读取 json 的 avro,在此之前我试图通过
csv.DictReader()
在 python 中使用 csv 生成 json,在这里我想返回一个递归生成 json 返回的函数,可以使用在 avro 生成函数中,但每当我尝试返回时,我只得到一个条目,
我们可以通过递归或任何其他方式实现吗?
def get_test():
with open("test.csv",'r') as f:
reader=csv.DictReader(f)
for row in reader:
json = {
"field1": row['field1'],
"field2": bool(row['field2']),
}
return json
def avro_gen():
schema = avro.schema.parse(open("test.avsc", "rb").read())
with open("test_" + ".avro", 'wb') as f:
writer = DataFileWriter(f, DatumWriter(), schema)
for i in range(get_count_csv_row()):
writer.append(get_Test())
writer.close()
get_count_csv_row()
csv 原始计数
预期产出:
{"field1": "test1","field2": true }
{"field1": "test2","field2": false}
实际:
{"field1": "test1","field2": true }
{"field1": "test1","field2": true }
每次迭代我都期望不同的 json,但我得到相同的输出到
avro_gen()
在您的
get_test()
函数中,您将返回第一个json。你需要把它们全部收集起来,最后归还收藏品:
def get_test():
json_list = []
with open("test.csv",'r') as f:
reader = csv.DictReader(f)
for row in reader:
json = {
"field1": row['field1'],
"field2": bool(row['field2'])
}
json_list.append(json)
return json_list
就个人而言,我会使用理解力来保持轻松:
def get_test():
with open("test.csv",'r') as f:
return [
{
"field1": row['field1'],
"field2": bool(row['field2'])
}
for row in csv.DictReader(f)
]
理想情况下,我实际上会使用
pandas
:
def get_test():
df = pd.read_csv("test.csv")
return df[['field1', 'field2']].apply({'field2': bool}, axis=1)
然后,在您的函数中,您需要处理所有这些 json 对象,将它们一个一个地写入您的 avro:
def avro_gen():
schema = avro.schema.parse(open("test.avsc", "rb").read())
with open("test_" + ".avro", 'wb') as f:
writer = DataFileWriter(f, DatumWriter(), schema)
for json in get_test():
writer.append(json)
writer.close()