Pandas返回“传递的标头名称与usecols不匹配”错误

Question

以下预期工作。一共有190列，它们全部读得很清楚。

pd.read_csv("data.csv", 
             header=None,
             names=columns,
             # usecols=columns[:10], 
             nrows=10
             )

我以前使用过usecols参数，所以我困惑为什么它不再对我有用。我猜想，简单地切片前10个列名将很容易，但是我仍然遇到“ Passed header name mismatch useuses”错误。

我正在使用熊猫0.16.2。

pd.read_csv("data.csv", 
             header=None,
             names=columns,
             usecols=columns[:10], 
             nrows=10
             )

追踪

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-44> in <module>()
      3                     nrows=10,
      4                     header=None,
----> 5                     names=columns,
      6                     )

/.../lib/python2.7/site-packages/pandas/io/parsers.pyc in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, na_fvalues, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, float_precision, nrows, iterator, chunksize, verbose, encoding, squeeze, mangle_dupe_cols, tupleize_cols, infer_datetime_format, skip_blank_lines)
    472                     skip_blank_lines=skip_blank_lines)
    473 
--> 474         return _read(filepath_or_buffer, kwds)
    475 
    476     parser_f.__name__ = name

/.../lib/python2.7/site-packages/pandas/io/parsers.pyc in _read(filepath_or_buffer, kwds)
    248 
    249     # Create the parser.
--> 250     parser = TextFileReader(filepath_or_buffer, **kwds)
    251 
    252     if (nrows is not None) and (chunksize is not None):

/.../lib/python2.7/site-packages/pandas/io/parsers.pyc in __init__(self, f, engine, **kwds)
    564             self.options['has_index_names'] = kwds['has_index_names']
    565 
--> 566         self._make_engine(self.engine)
    567 
    568     def _get_options_with_defaults(self, engine):

/.../m9tn/lib/python2.7/site-packages/pandas/io/parsers.pyc in _make_engine(self, engine)
    703     def _make_engine(self, engine='c'):
    704         if engine == 'c':
--> 705             self._engine = CParserWrapper(self.f, **self.options)
    706         else:
    707             if engine == 'python':

/.../lib/python2.7/site-packages/pandas/io/parsers.pyc in __init__(self, src, **kwds)
   1070         kwds['allow_leading_cols'] = self.index_col is not False
   1071 
-> 1072         self._reader = _parser.TextReader(src, **kwds)
   1073 
   1074         # XXX

pandas/parser.pyx in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4732)()

pandas/parser.pyx in pandas.parser.TextReader._get_header (pandas/parser.c:7330)()

ValueError: Passed header names mismatches usecols

Answer 1

事实证明，数据集中有191列（不是190列）。熊猫自动将我的第一列数据设置为索引。我不太清楚为什么会导致它出错，因为usecols中的所有列实际上都存在于数据集中的解析中。

因此，解决方案是确认名称中的列数与数据集中的列数完全对应。

而且，我在GitHub上发现了this讨论。

Answer 2

对于在那里调试此错误的任何人，如果您忘记了列名列表中的结尾逗号，也可能导致此错误。例如：

    columns = [
        'industry',
        'amount'
        'date',
        ...
    ]

Pandas将amount和date连接为单个amountdate，当然列名的数量比您期望的要少一个。

Pandas返回“传递的标头名称与usecols不匹配”错误

问题描述投票：3回答：2

2个回答

最新问题

Pandas返回“传递的标头名称与usecols不匹配”错误

问题描述 投票：3回答：2

2个回答

最新问题

问题描述投票：3回答：2