如何使用 Numpy dtypes 将二进制文件读入 Pandas DataFrame?

问题描述 投票:0回答:1

我想删除通过使用 Numpy.dtype 模板读取二进制文件生成的 DataFrame 中的行。我使用了多种方法删除一行并继续受到错误的阻碍,通常是:

TypeError: void() 至少需要 1 个位置参数(给定 0 个)

在 IDE 中打开变量资源管理器在尝试检查列名称时显示相同的错误,这表明提取数据的不正确方法在某种程度上损坏了列名称。

我按以下方式加载数据(为简洁起见,此处缩短了变量数量):

```
data_template = np.dtype([
    ('header_a','V22'),
    ('variable_A','>u2'),
    ('gpssec','>u4')
    ])

with open(source_file, 'rb') as f: byte_data = f.read()
np_data = np.frombuffer(byte_data, data_template)
df = pd.DataFrame(np_data)
```

当我尝试使用一种方法来减少 DataFrame 时。

`df = df[df['gpssec'] > 1000]`

我明白了...

    File C:\ProgramData\anaconda311\Lib\site-packages\pandas\core\frame.py:3798 in __getitem__
      return self._getitem_bool_array(key)

    File C:\ProgramData\anaconda311\Lib\site-packages\pandas\core\frame.py:3853 in _getitem_bool_array
      return self._take_with_is_copy(indexer, axis=0)

    File C:\ProgramData\anaconda311\Lib\site-packages\pandas\core\generic.py:3902 in _take_with_is_copy
      result = self._take(indices=indices, axis=axis)

    File C:\ProgramData\anaconda311\Lib\site-packages\pandas\core\generic.py:3886 in _take
      new_data = self._mgr.take(

    File C:\ProgramData\anaconda311\Lib\site-packages\pandas\core\internals\managers.py:978 in take
      return self.reindex_indexer(

    File C:\ProgramData\anaconda311\Lib\site-packages\pandas\core\internals\managers.py:751 in  reindex_indexer
      new_blocks = [

    File C:\ProgramData\anaconda311\Lib\site-packages\pandas\core\internals\managers.py:752 in <listcomp>
      blk.take_nd(

    File C:\ProgramData\anaconda311\Lib\site-packages\pandas\core\internals\blocks.py:880 in take_nd
      new_values = algos.take_nd(

    File C:\ProgramData\anaconda311\Lib\site-packages\pandas\core\array_algos\take.py:117 in take_nd
      return _take_nd_ndarray(arr, indexer, axis, fill_value, allow_fill)

    File C:\ProgramData\anaconda311\Lib\site-packages\pandas\core\array_algos\take.py:134 in _take_nd_ndarray
      dtype, fill_value, mask_info = _take_preprocess_indexer_and_fill_value(

    File C:\ProgramData\anaconda311\Lib\site-packages\pandas\core\array_algos\take.py:582 in _take_preprocess_indexer_and_fill_value
      dtype, fill_value = arr.dtype, arr.dtype.type()

    TypeError: void() takes at least 1 positional argument (0 given)

    ```

I've been able to work around the problem by copying each column of relevant data into a blank DataFrame that doesn't have the corrupt headers, but it's a kludgy solution. Not sure this qualifies as a bug as it's very likely it's a user error, but I can't find anything obvious I'm doing wrong.
python pandas dataframe numpy binary
1个回答
0
投票
In [230]: data_template = np.dtype([
     ...:     ('header_a','V22'),
     ...:     ('variable_A','>u2'),
     ...:     ('gpssec','>u4')
     ...:     ])

从此数据类型创建虚拟数组:

In [231]: arr = np.zeros(4, data_template)
In [232]: arr
Out[232]: 
array([(b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 0, 0),
       (b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 0, 0),
       (b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 0, 0),
       (b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 0, 0)],
      dtype=[('header_a', 'V22'), ('variable_A', '>u2'), ('gpssec', '>u4')])

我们可以用它制作一个数据框:

In [233]: df = pd.DataFrame(arr)

In [234]: df.describe()
Out[234]: 
       variable_A  gpssec
count         4.0     4.0
mean          0.0     0.0
std           0.0     0.0
min           0.0     0.0
25%           0.0     0.0
50%           0.0     0.0
75%           0.0     0.0
max           0.0     0.0

但是显示或信息引发错误:

In [235]: df.info()
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
© www.soinside.com 2019 - 2024. All rights reserved.