numpy.genfromtxt，列之间的不均匀空格会导致dtype错误吗？

Question

我正在使用的数据可以在此gist处找到，

看起来像：

07-11-2018 18:34:35 -2.001   5571.036 -1.987
07-11-2018 18:34:50 -1.999   5570.916 -1.988

image of code and output in Jupyter Notebook

通话时

TB_CAL_array = np.genfromtxt('calbath_data/TB118192.TXT',
                            skip_header = 10,
                            dtype = ([("date", "<U10"), ("time","<U8"), ("bathtemp", "<f8"), 
                                    ("SBEfreq", "<f8"), ("SBEtemp", "<f8")])

                               )

数组的输出是：

array([('07-11-2018', '18:34:35', -2.001e+00, 5571.036, -1.987),
   ('07-11-2018', '18:34:50', -1.999e+00, 5570.916, -1.988),

数据作为元组的结构化ndarray输出，并且是非均匀数组，因为它包含字符串和浮点数。 numpy.genfromtxt produces array of what looks like tuples, not a 2D array—why?

注意：数据输出的第三列已被视为指定的dtype之外的其他内容。

输出应为-2.001，但应为-2.001e+00

[注意：请注意，第五列具有相同的输入格式和dtype指定，但是在genfromtxt函数期间，在那里没有发生数据转换...

我能在“ bathtemp”和“ SBEtemp”之间找到的唯一区别是，在“ bathtemp”列之后有两个额外的空格...

但是基于numpy.genfromtxt IO documentation，这无关紧要，因为连续的空格应自动视为定界符。：

定界符：str，int或sequence，可选用于分隔值的字符串。默认情况下，任何连续的空格都用作分隔符。还可以提供整数或整数序列作为每个字段的宽度。

“ bathtemp”列之后的多余空格是否引起错误？如果是这样，我该如何解决？

Answer 1

与您的样品：

In [136]: txt="""07-11-2018 18:34:35 -2.001   5571.036 -1.987 
     ...: 07-11-2018 18:34:50 -1.999   5570.916 -1.988"""                       
In [137]: np.genfromtxt(txt.splitlines(), dtype=None, encoding=None)            
Out[137]: 
array([('07-11-2018', '18:34:35', -2.001, 5571.036, -1.987),
       ('07-11-2018', '18:34:50', -1.999, 5570.916, -1.988)],
      dtype=[('f0', '<U10'), ('f1', '<U8'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<f8')])

以及您的dtype：

In [139]: np.genfromtxt(txt.splitlines(), dtype= ([("date", "<U10"), ("time","<U
     ...: 8"), ("bathtemp", "<f8"),  
     ...:                                     ("SBEfreq", "<f8"), ("SBEtemp", "<
     ...: f8")]) 
     ...: , encoding=None)                                                      
Out[139]: 
array([('07-11-2018', '18:34:35', -2.001, 5571.036, -1.987),
       ('07-11-2018', '18:34:50', -1.999, 5570.916, -1.988)],
      dtype=[('date', '<U10'), ('time', '<U8'), ('bathtemp', '<f8'), ('SBEfreq', '<f8'), ('SBEtemp', '<f8')])

[-2.001e+00之类的值与-2.001相同。当值的范围足够宽或某些值太小而无法很好地显示时，numpy选择使用科学计数法。

例如，如果我将其中一个值更改为小得多：

In [140]: data = _                                                              
In [141]: data['bathtemp']                                                      
Out[141]: array([-2.001, -1.999])
In [142]: data['bathtemp'][1] *= 0.001                                          
In [143]: data['bathtemp']                                                      
Out[143]: array([-2.001e+00, -1.999e-03])

-2.001不变（显示样式除外。

我的猜测是，某些bathtemp值（您未显示）非常接近零。

Answer 2

由于skipinitialspace = True可选输入，我可以通过切换到pd.read_csv获得所需的输出（请参见reference的此处：]

skipinitialspace：bool，默认为False在定界符后跳过空格。

输入

colnames = ['date', 'time', 'bathtemp', 'SBEfreq', 'SBEtemp']
TB_CAL   = pd.read_csv("calbath_data/TB118192.CAL", header=None, skiprows=10, delimiter=" ", skipinitialspace=True, names=colnames )

输出

    date    time    bathtemp    SBEfreq SBEtemp
0   07-11-2018  18:34:35    -2.001  5571.036    -1.987
1   07-11-2018  18:34:50    -1.999  5570.916    -1.988
2   07-11-2018  18:35:06    -1.997  5571.058    -1.987

numpy.genfromtxt，列之间的不均匀空格会导致dtype错误吗？

问题描述投票：0回答：2

2个回答

最新问题

numpy.genfromtxt，列之间的不均匀空格会导致dtype错误吗？

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2