如何连接数据框中的较低列索引和唯一行索引

问题描述 投票:0回答:1

以下代码生成一个数据框

import pandas as pd
import tabula

page_number = "1"
pdf_url = "https://usviber.org/wp-content/uploads/2023/12/A23-OCT.pdf"

# Reads the PDF

tables = tabula.read_pdf(pdf_url, pages=page_number)

df = tables[1]

# Selects relvant columns and rows

numeric_columns = df.select_dtypes(include=["number"])
df = df.drop(numeric_columns.columns[(numeric_columns < 0).any()], axis=1)
df = df.loc[2:13, :].iloc[:, :5]

# Set the column index to the island names

df.set_index(df.columns[0], inplace=True)

# Rename columns based on year

df.columns = pd.MultiIndex.from_product(
    [["St Thomas", "St. Croix"], ["2022", "2023"]], names=["Island", "Year"]
)

# Map the index to uppercase and extract the first 3 characters

df.index = df.index.map(lambda x: str(x).upper()[:3])
df.index.set_names("Month", inplace=True)

这是它制作的数据框

print(df)
Island St Thomas         St. Croix        
Year        2022    2023      2022    2023
Month                                     
JAN       55,086  60,470    11,550  12,755
FEB       57,929  56,826    12,441  13,289
MAR       72,103  64,249    14,094  15,880
APR       67,469  56,321    12,196  13,092
MAY       60,092  49,534    13,385  16,497
JUN       67,026  56,950    14,009  15,728
JUL       66,353  61,110    13,768  16,879
AUG       50,660  42,745    10,673  12,102
SEP       24,507  25,047     6,826   6,298
OCT       34,025  34,462    10,351   9,398
NOV       44,500     NaN     9,635     NaN
DEC       58,735     NaN    12,661     NaN

我想要的是将岛屿名称作为行索引,将月份和年份串联作为列名称,从而得到 2 行 24 列的数据集。所以,第一排是圣托马斯。第一列是 JAN2022,利息值为 56086。下一列是 FEB2022,值为 57929,依此类推,直到 2023 年 12 月。第二行是圣克罗伊岛,具有相应的值和时间期间如上所述。我该怎么做?

python pandas dataframe multi-index
1个回答
0
投票

如果我理解正确的话,

stack
transpose
,然后展平MultiIndex列:

out = df.stack().T
out.columns = out.columns.map(lambda x: f'{x[0]}{x[1]}')
© www.soinside.com 2019 - 2024. All rights reserved.