我正在从文件列表中创建熊猫数据框,并尝试为数据框中的每个文件添加一列索引(day
)。代码:
for file in list_of_files:
df = pd.read_csv(file, delimiter=',')
df['day'] = []
df['day'] = list_of_files.index(file)
df2 = []
df2.append(df)
frame = pd.concat(df2, axis=0, ignore_index=True)
我希望得到的结果:
i SECCODE BUYSELL TIME ORDERNO ACTION PRICE VOLUME TRADENO TRADEPRICE day
0 18 SU25080RMFS1 B 100000000 18 1 97.5228 204 NaN NaN 0
1 19 SU26203RMFS8 B 100043856 19 1 98.8707 206 NaN NaN 0
2 20 SU26206RMFS1 B 103543575 20 1 97.1110 208 NaN NaN 0
3 184 SU26205RMFS3 S 100000000 184 1 93.0000 1 NaN NaN 1
4 185 SU26205RMFS3 S 100000000 185 1 93.1000 1 NaN NaN 1
我得到的错误:
ValueError Traceback (most recent call last)
<ipython-input-86-bb0a3bc69a9b> in <module>
1 for file in list_of_files:
2 df = pd.read_csv(file, delimiter=',')
----> 3 df['day'] = []
4 df['day'] = list_of_files.index(file)
5 df2 = []
D:\Anaconda3\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value)
3368 else:
3369 # set column
-> 3370 self._set_item(key, value)
3371
3372 def _setitem_slice(self, key, value):
D:\Anaconda3\lib\site-packages\pandas\core\frame.py in _set_item(self, key, value)
3443
3444 self._ensure_valid_index(value)
-> 3445 value = self._sanitize_column(key, value)
3446 NDFrame._set_item(self, key, value)
3447
D:\Anaconda3\lib\site-packages\pandas\core\frame.py in _sanitize_column(self, key, value, broadcast)
3628
3629 # turn me into an ndarray
-> 3630 value = sanitize_index(value, self.index, copy=False)
3631 if not isinstance(value, (np.ndarray, Index)):
3632 if isinstance(value, list) and len(value) > 0:
D:\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in sanitize_index(data, index, copy)
517
518 if len(data) != len(index):
--> 519 raise ValueError('Length of values does not match length of index')
520
521 if isinstance(data, ABCIndexClass) and not copy:
ValueError: Length of values does not match length of index
如果删除df['day'] = []
行,我只会获取最后一个索引作为所有文件的索引。
如果我的理解正确,这应该可以满足您的需求。在循环访问文件时使用enumerate
来访问索引,而不是在循环中使用df.append
,只需在最后使用pd.concat
创建一个最终的DataFrame来串联一个DataFrames列表:
dfs = []
for day, file in enumerate(list_of_files):
df = pd.read_csv(file, delimiter=',')
df['day'] = day
dfs.append(df)
final = pd.concat(dfs, ignore_index=True)