我确信我的问题与我如何设置数据的初始布局有关,但我不知所措。
我想要的最终结果是为每个公司创建一份报告并将列重新组织为行。
原始文件如下所示:
符号 | 姓名 | 证券交易所 | RBICS 经济 | 销售额0 | 销售1 | 销售2 | 网络公司0 | 网络公司1 | 网络公司2 | ... |
---|---|---|---|---|---|---|---|---|---|---|
A | 安捷伦科技公司 | 纽约证券交易所 | 医疗保健 | 6319 | 5339 | 5163 | 438 | 232 | 724 | ... |
AA | 美国铝业公司 | 纽约证券交易所 | 非能源材料 | 12437 | 9372 | 10495 | 429 | -170 | -1125 | ... |
AABB | 亚洲宽带公司 | 纽约证券交易所 | 金融 | 2.765 | 22.622 | 8.329 | 4.459 | 68.155 | 13.168 | ... |
*注:Sales0,Sales1,Sales2,... 代表 2023, 2022, 2021,... 的销售数字,其他列项目相同。 *
我想要的是这个:
*(我不关心前几个描述性列去哪里(即符号、名称、证券交易所、RBICS 经济)。事实上,它们可以在单独的表中表示。) *
对于每个符号,在 Sales、NetInc... 列中逐一呈现数据:
对于符号,A
金融项目 | 2023 | 2022 | 2021 |
---|---|---|---|
销售 | 6319 | 5339 | 5163 |
网络公司 | 438 | 232 | 724 |
对于符号,AA
金融项目 | 2023 | 2022 | 2021 |
---|---|---|---|
销售 | 12437 | 9372 | 10495 |
网络公司 | 429 | -170 | -1125 |
我可能会提到我正在尝试以 Streamlit 方式获取输出,但我对其他选项持开放态度。
我最好的尝试:
sales_columns = ['Sales0', 'Sales1', 'Sales2', 'Sales3', 'Sales4', 'Sales5', 'Sales6', 'Sales7', 'Sales8', 'Sales9', 'Sales10', 'Sales11']
net_income_columns = ['NetInc0', 'NetInc1', 'NetInc2', 'NetInc3', 'NetInc4', 'NetInc5', 'NetInc6', 'NetInc7', 'NetInc8', 'NetInc9', 'NetInc10', 'NetInc11']
ebit_columns = ['EBIT0', 'EBIT1', 'EBIT2', 'EBIT3', 'EBIT4', 'EBIT5', 'EBIT6', 'EBIT7', 'EBIT8', 'EBIT9', 'EBIT10', 'EBIT11']
equity_columns = ['Equity0', 'Equity1', 'Equity2', 'Equity3', 'Equity4', 'Equity5', 'Equity6', 'Equity7', 'Equity8', 'Equity9', 'Equity10', 'Equity11']
tangible_assets_columns = ['TangibleAssets0', 'TangibleAssets1', 'TangibleAssets2', 'TangibleAssets3', 'TangibleAssets4', 'TangibleAssets5', 'TangibleAssets6', 'TangibleAssets7', 'TangibleAssets8', 'TangibleAssets9', 'TangibleAssets10', 'TangibleAssets11']
# List of column groups
column_groups = [sales_columns, net_income_columns, ebit_columns, equity_columns, tangible_assets_columns]
# List of years
years = ['2023', '2022', '2021', '2020', '2019', '2018', '2017', '2016', '2015', '2014', '2013']
# List to store DataFrames for each company
result_tables = []
# Iterate over each company
for company in df['Symbol'].unique():
# Filter the DataFrame for the current company
company_df = df[df['Symbol'] == company]
# Melt each column group separately
melted_dfs = []
for group in column_groups:
melted_df = pd.melt(company_df, id_vars=['Symbol'], value_vars=group, var_name='Variable', value_name=company)
melted_dfs.append(melted_df)
# Concatenate melted DataFrames along columns
result_table = pd.concat(melted_dfs, axis=1)
# Add a 'Group Item' column to indicate the column group
result_table['Group Item'] = result_table['Variable'].apply(lambda x: x[:-1])
# Reorder columns
result_table = result_table[['Group Item', company] + years]
# Drop the 'Variable' column
result_table = result_table.drop(columns='Variable')
# Append the result to the list
result_tables.append(result_table)
# Display the resulting tables for each company
for idx, result_table in enumerate(result_tables, 1):
print(f"\nTable for Company {idx}:\n{result_table}")
我收到一个 KeyError:“Group Item”
上述异常是导致以下异常的直接原因: 第3799章 3800 除了KeyError: 第3801章
# melt the DataFrame on "Symbol", group by "Symbol" and "Financial item", then sum the values in each group
df = (
df.melt("Symbol", var_name="Financial item")
.groupby(["Symbol", "Financial item"])
.sum()
.reset_index()
)
# remove rows that do not have numeric values
df["value"] = pd.to_numeric(df["value"], errors="coerce")
df = df[~df["value"].isna()]
# define a "year" column based on the trailing numbers from "Financial item" values and a dict `years`
years = {0: 2023, 1: 2022, 2: 2021}
df["year"] = (
df["Financial item"].apply(lambda x: int(re.findall(r"\d+$", x)[-1])).map(years)
)
# remove the trailing numbers from "Financial item" values
df["Financial item"] = df["Financial item"].apply(lambda x: re.sub(r"\d+$", "", x))
# for each symbol, print the symbol, then pivot the table to have years as columns
for symbol in df["Symbol"].drop_duplicates().to_numpy():
print("Symbol", symbol)
symbol_df = df.drop(columns="Symbol").loc[df["Symbol"] == symbol]
symbol_df = symbol_df.pivot_table(
index="Financial item", columns="year", values="value"
).reset_index()
symbol_df.columns.name = None
# Reorder columns
symbol_df = symbol_df[["Financial item"] + list(years.values())]
print(symbol_df)
这将打印:
Symbol A
Financial item 2023 2022 2021
0 NetInc 438.0 232.0 724.0
1 Sales 6319.0 5339.0 5163.0
Symbol AA
Financial item 2023 2022 2021
0 NetInc 429.0 -170.0 -1125.0
1 Sales 12437.0 9372.0 10495.0
Symbol AABB
Financial item 2023 2022 2021
0 NetInc 4.459 68.155 13.168
1 Sales 2.765 22.622 8.329