Pandas DataFrame 中的迭代问题

Question

我根据第一行（包含索引）和第一列（包含接收索引的日期及其类型）的标题从 xlsx 文件中读取数据。在屏幕截图中，您可以看到数据组织的性质：

我想出了如何制作 pandas DataFrame 来读取一个索引。结果是这种类型的 DataFrame：

我不知道如何一次正确读取所有索引，例如使用循环或更好的列表理解。

在这里，我提出了我的解决方案 - 它部分有效，但我无法理解如何正确迭代

f'{index_names[1]}_{val}

，以便它适用于所有索引，而不仅仅是一个索引。我也无法弄清楚如何转换

sheet['C' + str(item)]

条目，以便它迭代所有索引，而不仅仅是一个。

characteristic = [100, 200, 300, 400, 500, 600, 700]
index_names = [sheet[1][row].value for row in range(1,sheet.max_row) if sheet[1][row].value != None]

index_list = [pd.DataFrame(
{f'{index_names[1]}_{val}': [sheet['C' + str(item)].value for item in range(1,26) 
            if sheet['A' + str(item)].value == val] 
        for val in characteristic},
    index = ['April 12', 'April 20', 'April 29']
) for _ in range(39)]

也许我的代码看起来很麻烦，可以简化一下

UPD：如果我们添加

index_list[0].to_dict('tight')

那么结果将如下：

{'index': ['April 12', 'April 20', 'April 29'],
 'columns': ['Second_index_100',
  'Second_index_200',
  'Second_index_300',
  'Second_index_400',
  'Second_index_500',
  'Second_index_600',
  'Second_index_700'],
 'data': [[0.43927605317127927,
   -0.24029588928209195,
   0.26450969805682434,
   0.18810770500537646,
   0.26586690176009525,
   0.21631310872586834,
   0.32927840726651636],
  [0.16442875037513777,
   0.12442062805633937,
   0.06353459713174614,
   0.14329091121735923,
   0.17469551024592245,
   0.20938555077590043,
   0.17154589574351475],
  [0.4615041268976439,
   0.6488484892496023,
   0.28007883537118355,
   0.5962923255606478,
   0.5924116517116391,
   0.559117121673802,
   0.6458160644845848]],
 'index_names': [None],
 'column_names': [None]}

Answer 1

假设导入后有这样的输入（带有

df = pd.read_excel('input.xlsx', index_col=0)

）：

          First_index  Second_index  Third_index
April 12          NaN           NaN          NaN
100               1.0           2.0          3.0
200               4.0           5.0          6.0
300               7.0           8.0          9.0
400              10.0          11.0         12.0
500              13.0          14.0         15.0
600              16.0          17.0         18.0
700              19.0          20.0         21.0
April 20          NaN           NaN          NaN
100              22.0          23.0         24.0
200              25.0          26.0         27.0
300              28.0          29.0         30.0
400              31.0          32.0         33.0
500              34.0          35.0         36.0
600              37.0          38.0         39.0
700              40.0          41.0         42.0

您可以根据掩码过滤行以识别日期与未来的列，然后

pivot

:

# move index back to column (only if not already a column)
# if already a column, use its name in the following code
# instead of "index"
tmp = df.reset_index()

# identify rows that we be pivoted
# you could also use pd.to_numeric/pd.to_datetime on the "index"
m = tmp['First_index'].isna()

# reshape
out = (tmp[~m].assign(idx=tmp['index'].where(m).ffill())
       .pivot(index='idx', columns='index')
       .rename_axis(None)
      )

# flatten the column MultiIndex
out.columns = out.columns.map(lambda x: f'{x[0]}_{x[1]}')

输出：

          First_index_100  First_index_200  First_index_300  First_index_400  First_index_500  First_index_600  First_index_700  Second_index_100  Second_index_200  Second_index_300  Second_index_400  Second_index_500  Second_index_600  Second_index_700  Third_index_100  Third_index_200  Third_index_300  Third_index_400  Third_index_500  Third_index_600  Third_index_700
April 12              1.0              4.0              7.0             10.0             13.0             16.0             19.0               2.0               5.0               8.0              11.0              14.0              17.0              20.0              3.0              6.0              9.0             12.0             15.0             18.0             21.0
April 20             22.0             25.0             28.0             31.0             34.0             37.0             40.0              23.0              26.0              29.0              32.0              35.0              38.0              41.0             24.0             27.0             30.0             33.0             36.0             39.0             42.0

Pandas DataFrame 中的迭代问题

问题描述投票：0回答：1

1个回答

最新问题

Pandas DataFrame 中的迭代问题

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1