如何将numpy数组写入avro文件?

问题描述 投票:0回答:1

我想将 numpy 数组写入 avro 文件。这是 numpy 数组的一个小例子:

import numpy as np
import random
np_array = np.zeros((4,3), dtype=np.float32)
    for i in range(4):
        for j in range(3):
            np_array[i, j] = random.gauss(0, 1)
print(np_array)

输出:

[[ 0.6490377   0.29544145 -1.109375  ]
 [ 1.0881975  -0.39123887 -0.36691198]
 [-1.2226632   0.8332004   0.2686829 ]
 [ 1.5417658   0.4520132  -0.03081623]]

对于我的用例,numpy 数组有 500 万行和 128 列,所以如果可能的话,我想将数组直接写入 avro,而不花费内存将其转换为字典和/或 Pandas DataFrame。

python numpy avro
1个回答
0
投票

我能够回答我自己的问题。

import numpy as np
import random
np_array = np.zeros((4,3), dtype=np.float32)
for i in range(4):
    for j in range(3):
        np_array[i, j] = random.gauss(0, 1)
print(np_array)

输出:

[[ 0.6490377   0.29544145 -1.109375  ]
 [ 1.0881975  -0.39123887 -0.36691198]
 [-1.2226632   0.8332004   0.2686829 ]
 [ 1.5417658   0.4520132  -0.03081623]]
import fastavro
schema_dict = {
    "doc": "test",
    "name": "test",
    "namespace": "test",
    "type": "array",
    "items": "float"
}
schema = fastavro.parse_schema(schema_dict)
with open(<filepath>, "wb") as f:
    fastavro.writer(f, schema, np_array)

with open(<filepath>, "rb") as f:
    reader = fastavro.reader(f)
    for record in reader:
        print(record)

输出:

[ 0.6490377   0.29544145 -1.109375  ]
[ 1.0881975  -0.39123887 -0.36691198]
[-1.2226632   0.8332004   0.2686829 ]
[ 1.5417658   0.4520132  -0.03081623]
© www.soinside.com 2019 - 2024. All rights reserved.