将嵌套数组传输到 csv 文件

问题描述 投票:0回答:1

我目前正在使用 .mat 格式的数据集。然而,我遇到了挑战,因为数据集包含嵌套数组,并且我需要利用 CSV 格式的数据。

我正在寻求有关将此嵌套 .mat 数据集转换为 CSV 格式的最有效方法的指导。我们将非常感谢您在这方面的专业知识。 我的数据集链接: https://ora.ox.ac.uk/objects/uuid:03ba4b01-cfed-46d3-9b1a-7d4a7bdf6fac/files/m5ac36a1e2073852e4f1f7dee647909a7

import numpy as np
import pandas as pd
import scipy.io as sio
mat = sio.loadmat('Oxford_Battery_Degradation_Dataset_1.mat')
mat

我的输出

{'__header__': b'MATLAB 5.0 MAT-file, Platform: PCWIN64, Created on: Mon Jun 05 11:16:25 2017',
 '__version__': '1.0',
 '__globals__': [],
 'Cell1': array([[(array([[(array([[(array([[735954.85896553],
                                [735954.8589771 ],
                                [735954.85898867],
                                ...,
                                [735954.8995558 ],



dtype=[('t', 'O'), ('v', 'O'), ('q', 'O'), ('T', 'O')]))                                   ]],
               dtype=[('C1ch', 'O'), ('C1dc', 'O'), ('OCVch', 'O'), ('OCVdc', 'O')]))                                            ]],
       dtype=[('cyc0000', 'O'), ('cyc0100', 'O'), ('cyc0300', 'O'), ('cyc0400', 'O'), ('cyc0500', 'O'), ('cyc0600', 'O'), ('cyc0700', 'O'), ('cyc0800', 'O'), ('cyc0900', 'O'), ('cyc1000', 'O'), ('cyc1100', 'O'), ('cyc1200', 'O'), ('cyc1300', 'O'), ('cyc1400', 'O'), ('cyc1600', 'O'), ('cyc1800', 'O'), ('cyc1900', 'O'), ('cyc2000', 'O'), ('cyc2100', 'O'), ('cyc2200', 'O'), ('cyc2300', 'O'), ('cyc2400', 'O'), ('cyc2500', 'O'), ('cyc2600', 'O'), ('cyc2700', 'O'), ('cyc2800', 'O'), ('cyc2900', 'O'), ('cyc3000', 'O'), ('cyc3100', 'O'), ('cyc3200', 'O'), ('cyc3300', 'O'), ('cyc3500', 'O'), ('cyc3600', 'O'), ('cyc3700', 'O'), ('cyc3800', 'O'), ('cyc3900', 'O'), ('cyc4000', 'O'), ('cyc4100', 'O'), ('cyc4200', 'O'), ('cyc4300', 'O'), ('cyc4400', 'O'), ('cyc4500', 'O'), ('cyc4600', 'O'), ('cyc4800', 'O'), ('cyc5000', 'O'), ('cyc5100', 'O'), ('cyc5200', 'O'), ('cyc5300', 'O'), ('cyc5400', 'O'), ('cyc5500', 'O'), ('cyc5600', 'O'), ('cyc5700', 'O'), ('cyc5800', 'O'), ('cyc5900', 'O'), ('cyc6000', 'O'), ('cyc6100', 'O'), ('cyc6200', 'O'), ('cyc6300', 'O'), ('cyc6400', 'O'), ('cyc6500', 'O'), ('cyc6600', 'O'), ('cyc6700', 'O'), ('cyc6800', 'O'), ('cyc6900', 'O'), ('cyc7000', 'O'), ('cyc7100', 'O'), ('cyc7200', 'O'), ('cyc7300', 'O'), ('cyc7400', 'O'), ('cyc7500', 'O'), ('cyc7600', 'O'), ('cyc7700', 'O'), ('cyc7800', 'O'), ('cyc7900', 'O'), ('cyc8000', 'O'), ('cyc8100', 'O')])}

事实上,我应该有八个这种格式的数据集,其中列与数组中的“t”、“v”、“q”和“T”相关联。有一个样本代表一个细胞数据集的预期结果:

cell8= pd.DataFrame(columns=['Time','Voltage','Capacity','Temperature'])
cell8
python csv transform mat
1个回答
0
投票

我不确定您是否意识到这里的数据量。我有可以提取数据的代码,但这里的数据项刚刚超过 6100 万个。打印为 CSV 文件,大小约为 2.5 GB。

import numpy as np
import scipy.io as sio
mat = sio.loadmat('Oxford_Battery_Degradation_Dataset_1.mat')

def dive(names,cell):
    global lines
    if len(cell) > 1000:
        for n in cell:
            print(','.join(names+[str(n[0])]))
    elif len(cell) > 1:
        for n,c in zip(cell.dtype.fields, cell):
            dive(names+[n], c)
    else:
        dive(names,cell[0])

for cno in range(8):
    name = f'Cell{cno+1}'
    cell = mat[name]
    dive([name],mat[name])

该文件的开头如下所示:

Cell1,cyc0000,C1ch,t,735954.8589655256
Cell1,cyc0000,C1ch,t,735954.8589770996
Cell1,cyc0000,C1ch,t,735954.8589886738
Cell1,cyc0000,C1ch,t,735954.8590002478
Cell1,cyc0000,C1ch,t,735954.8590118219
Cell1,cyc0000,C1ch,t,735954.859023396
Cell1,cyc0000,C1ch,t,735954.85903497
Cell1,cyc0000,C1ch,t,735954.8590465442
Cell1,cyc0000,C1ch,t,735954.8590581182
Cell1,cyc0000,C1ch,t,735954.8590696923
Cell1,cyc0000,C1ch,t,735954.8590812663
Cell1,cyc0000,C1ch,t,735954.8590928405
Cell1,cyc0000,C1ch,t,735954.8591044145
Cell1,cyc0000,C1ch,t,735954.8591159886
Cell1,cyc0000,C1ch,t,735954.8591275626
Cell1,cyc0000,C1ch,t,735954.8591391367
Cell1,cyc0000,C1ch,t,735954.8591507107
Cell1,cyc0000,C1ch,t,735954.8591622849
Cell1,cyc0000,C1ch,t,735954.8591738589
Cell1,cyc0000,C1ch,t,735954.859185433
Cell1,cyc0000,C1ch,t,735954.8591970071
Cell1,cyc0000,C1ch,t,735954.8592085812
Cell1,cyc0000,C1ch,t,735954.8592201553
Cell1,cyc0000,C1ch,t,735954.8592317293
Cell1,cyc0000,C1ch,t,735954.8592433034
Cell1,cyc0000,C1ch,t,735954.8592548774
Cell1,cyc0000,C1ch,t,735954.8592664516
Cell1,cyc0000,C1ch,t,735954.8592780256
Cell1,cyc0000,C1ch,t,735954.8592895997
Cell1,cyc0000,C1ch,t,735954.8593011737
Cell1,cyc0000,C1ch,t,735954.8593127478
Cell1,cyc0000,C1ch,t,735954.8593243218
Cell1,cyc0000,C1ch,t,735954.859335896
Cell1,cyc0000,C1ch,t,735954.8593474701
Cell1,cyc0000,C1ch,t,735954.8593590441
Cell1,cyc0000,C1ch,t,735954.8593706182
Cell1,cyc0000,C1ch,t,735954.8593821923
...

第一列运行 Cell1 到 Cell8。第二列有 70 到 80 个条目,

cyc0000
cyc0100
等。第三列有 4 个条目,
C1ch
C1dc
OCVch
OCVdc
。第四列有 4 个条目,
t
v
q
T
。您无法运行这些数字,因为最后一个维度的大小变化很大,从 2,500 到 10,000 个条目。

所以,我想你必须弄清楚你想要什么。如果您想要的话,您可以将其转换为一组嵌套字典。

© www.soinside.com 2019 - 2024. All rights reserved.