我正在使用以下代码从S3中读取CSV:
s3 = boto3.client('s3','us-east-1')
bucket = "bucket"
key = "key"
obj = s3.get_object(Bucket=bucket, Key=key)
fieldnames = [i for i in range(0,13)]
lines1 = obj['Body'].read().decode('utf-8').split('\n')
testls = [row for row in csv.DictReader(lines1[1:], fieldnames)]
out = json.dumps([row for row in testls])
但是问题是CSV中的一个字段是JSON,所以最后一步得到的JSON字符串看起来像这样:
{"Date": "2020-03-02 15:18:10.724017", "First?": "", "metadata": "{\"field1\":\"NULL\"}"}
如何避免这种情况?
您可以在读取行时将元数据扩展为python dict,以便将其集成到json字符串中。附带说明,testls已经是一个列表,不需要额外的列表理解。
s3 = boto3.client('s3','us-east-1')
bucket = "bucket"
key = "key"
obj = s3.get_object(Bucket=bucket, Key=key)
fieldnames = [i for i in range(0,13)]
lines1 = obj['Body'].read().decode('utf-8').split('\n')
testls = []
for row in csv.DictReader(lines1[1:], fieldnames):
row["metadata"] = json.loads(row["metadata"])
testls.append(row)
out = json.dumps(testls)