我有一个软件工具的excel工作表输出,该工具的结构如下所示。excel结构:
+---+-------+--------------+--------------+
| | | | |
+---+-------+--------------+--------------+
| | | not relevant | not relevant |
+---+-------+--------------+--------------+
| | | X1 | Y1 |
+---+-------+--------------+--------------+
|fr | Time | not relevant | not relevant |
+---+-------+--------------+--------------+
| 1 | 0.000 | 12 | 32 |
+---+-------+--------------+--------------+
| 2 | 0.010 | 23 | 3 |
+---+-------+--------------+--------------+
| 3 | 0.020 | 45 | 4 |
+---+-------+--------------+--------------+
| 4 | 0.030 | 4 | 1 |
+---+-------+--------------+--------------+
| | | | |
+---+-------+--------------+--------------+
| | | not relevant | |
+---+-------+--------------+--------------+
| | | Y2 | |
+---+-------+--------------+--------------+
|fr | Time | not relevant | |
+---+-------+--------------+--------------+
| 1 | 0.000 | 5 | |
+---+-------+--------------+--------------+
| 2 | 0.010 | 89 | |
+---+-------+--------------+--------------+
| 3 | 0.020 | 5 | |
+---+-------+--------------+--------------+
| 4 | 0.030 | 3 | |
+---+-------+--------------+--------------+
| | | | |
+---+-------+--------------+--------------+
| | | not relevant | |
+---+-------+--------------+--------------+
| | | X3 | |
+---+-------+--------------+--------------+
|fr | Time | not relevant | |
+---+-------+--------------+--------------+
| 1 | 0.000 | 17 | |
+---+-------+--------------+--------------+
| 2 | 0.010 | 2 | |
+---+-------+--------------+--------------+
| 3 | 0.020 | 4 | |
+---+-------+--------------+--------------+
| 4 | 0.030 | 23 | |
+---+-------+--------------+--------------+
csv结构:
,,,
,,not relevant,not relevant
,,X1,Y1
fr,Time,not relevant,not relevant
1,0.000,12,32
2,0.010,23,3
3,0.020,45,4
4,0.030,4,1
,,,
,,not relevant,
,,Y2,
fr,Time,not relevant,
1,0.000,5,
2,0.010,89,
3,0.020,5,
4,0.030,3,
,,,
,,not relevant,
,,X3,
fr,Time,not relevant,
1,0.000,17,
2,0.010,2,
3,0.020,4,
4,0.030,23,
我正在寻找一种快速的方法来将此凌乱的数据转换为整洁的熊猫数据框。
最终结果应如下所示。
Time X1 Y1 Y2 X3
0.000 12 32 5 17
0.010 23 3 89 2
0.020 45 4 5 4
0.030 4 1 3 23
我做了以下事情……对此并不感到很高兴,但是它有效。
import numpy as np
import pandas as pd
filename = 'test_data'
df = pd.read_excel(filename + '.xlsx', header=None)
df_list = np.split(df, df[df.isnull().all(1)].index)
del df_list[0]
for i, df in enumerate(df_list):
df.iloc[3, 2:] = df.iloc[2, 2:]
new_header = df.iloc[3]
df.columns = new_header
df = df.iloc[4:]
df_tmp = df.drop(['Frame'], axis=1)
df = df_tmp.set_index("Time")
df.dropna(axis=1, how='all', inplace=True)
df.columns.name = None
df_list[i] = df
df = pd.concat(df_list, axis=1)
df = df.reindex(sorted(df.columns), axis=1)
df.to_csv(filename + '.csv')