将多个文件作为大熊猫数据框导入,并为每个文件添加索引

问题描述 投票:0回答:1

我正在从文件列表中创建熊猫数据框,并尝试为数据框中的每个文件添加一列索引(day)。代码:

for file in list_of_files:
    df = pd.read_csv(file, delimiter=',')
    df['day'] = []
    df['day'] = list_of_files.index(file)
    df2 = []
    df2.append(df)
frame = pd.concat(df2, axis=0, ignore_index=True)

我希望得到的结果:

i SECCODE   BUYSELL TIME ORDERNO    ACTION  PRICE   VOLUME  TRADENO TRADEPRICE  day
0   18  SU25080RMFS1    B   100000000   18  1   97.5228 204 NaN NaN 0
1   19  SU26203RMFS8    B   100043856   19  1   98.8707 206 NaN NaN 0
2   20  SU26206RMFS1    B   103543575   20  1   97.1110 208 NaN NaN 0
3   184 SU26205RMFS3    S   100000000   184 1   93.0000 1   NaN NaN 1
4   185 SU26205RMFS3    S   100000000   185 1   93.1000 1   NaN NaN 1

我得到的错误:

ValueError                                Traceback (most recent call last)
<ipython-input-86-bb0a3bc69a9b> in <module>
      1 for file in list_of_files:
      2     df = pd.read_csv(file, delimiter=',')
----> 3     df['day'] = []
      4     df['day'] = list_of_files.index(file)
      5     df2 = []

D:\Anaconda3\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value)
   3368         else:
   3369             # set column
-> 3370             self._set_item(key, value)
   3371 
   3372     def _setitem_slice(self, key, value):

D:\Anaconda3\lib\site-packages\pandas\core\frame.py in _set_item(self, key, value)
   3443 
   3444         self._ensure_valid_index(value)
-> 3445         value = self._sanitize_column(key, value)
   3446         NDFrame._set_item(self, key, value)
   3447 

D:\Anaconda3\lib\site-packages\pandas\core\frame.py in _sanitize_column(self, key, value, broadcast)
   3628 
   3629             # turn me into an ndarray
-> 3630             value = sanitize_index(value, self.index, copy=False)
   3631             if not isinstance(value, (np.ndarray, Index)):
   3632                 if isinstance(value, list) and len(value) > 0:

D:\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in sanitize_index(data, index, copy)
    517 
    518     if len(data) != len(index):
--> 519         raise ValueError('Length of values does not match length of index')
    520 
    521     if isinstance(data, ABCIndexClass) and not copy:

ValueError: Length of values does not match length of index

如果删除df['day'] = []行,我只会获取最后一个索引作为所有文件的索引。

python pandas dataframe indexing
1个回答
0
投票

如果我的理解正确,这应该可以满足您的需求。在循环访问文件时使用enumerate来访问索引,而不是在循环中使用df.append,只需在最后使用pd.concat创建一个最终的DataFrame来串联一个DataFrames列表:

dfs = []
for day, file in enumerate(list_of_files):
    df = pd.read_csv(file, delimiter=',')
    df['day'] = day
    dfs.append(df)

final = pd.concat(dfs, ignore_index=True)
© www.soinside.com 2019 - 2024. All rights reserved.