这是一个通过 treasury.gov API 获取税务数据的小型 Python 程序:
import pandas as pd
import treasury_gov_pandas
# ----------------------------------------------------------------------
df = treasury_gov_pandas.update_records(
url = 'https://api.fiscaldata.treasury.gov/services/api/fiscal_service/v1/accounting/dts/deposits_withdrawals_operating_cash')
df['record_date'] = pd.to_datetime(df['record_date'])
df['transaction_today_amt'] = pd.to_numeric(df['transaction_today_amt'])
tmp = df[(df['transaction_type'] == 'Deposits') & ((df['transaction_catg'].str.contains('Tax')) | (df['transaction_catg'].str.contains('FTD'))) ]
程序使用以下库来下载数据:
https://github.com/dharmatech/treasury-gov-pandas.py
结果数据如下所示:
>>> tmp.tail(20).drop(columns=['table_nbr', 'table_nm', 'src_line_nbr', 'record_fiscal_year', 'record_fiscal_quarter', 'record_calendar_year', 'record_calendar_quarter', 'record_calendar_month', 'record_calendar_day', 'transaction_mtd_amt', 'transaction_fytd_amt', 'transaction_catg_desc', 'account_type', 'transaction_type'])
record_date transaction_catg transaction_today_amt
371266 2024-04-03 DHS - Customs and Certain Excise Taxes 84
371288 2024-04-03 Taxes - Corporate Income 237
371289 2024-04-03 Taxes - Estate and Gift 66
371290 2024-04-03 Taxes - Federal Unemployment (FUTA) 10
371291 2024-04-03 Taxes - IRS Collected Estate, Gift, misc 23
371292 2024-04-03 Taxes - Miscellaneous Excise 41
371293 2024-04-03 Taxes - Non Withheld Ind/SECA Electronic 1786
371294 2024-04-03 Taxes - Non Withheld Ind/SECA Other 2315
371295 2024-04-03 Taxes - Railroad Retirement 3
371296 2024-04-03 Taxes - Withheld Individual/FICA 12499
371447 2024-04-04 DHS - Customs and Certain Excise Taxes 82
371469 2024-04-04 Taxes - Corporate Income 288
371470 2024-04-04 Taxes - Estate and Gift 59
371471 2024-04-04 Taxes - Federal Unemployment (FUTA) 8
371472 2024-04-04 Taxes - IRS Collected Estate, Gift, misc 127
371473 2024-04-04 Taxes - Miscellaneous Excise 17
371474 2024-04-04 Taxes - Non Withheld Ind/SECA Electronic 1905
371475 2024-04-04 Taxes - Non Withheld Ind/SECA Other 1092
371476 2024-04-04 Taxes - Railroad Retirement 1
371477 2024-04-04 Taxes - Withheld Individual/FICA 2871
数据框包含可追溯到 2005 年的数据:
>>> tmp.drop(columns=['table_nbr', 'table_nm', 'src_line_nbr', 'record_fiscal_year', 'record_fiscal_quarter', 'record_calendar_year', 'record_calendar_quarter', 'record_calendar_month', 'record_calendar_day', 'transaction_mtd_amt', 'transaction_fytd_amt', 'transaction_catg_desc', 'account_type', 'transaction_type'])
record_date transaction_catg transaction_today_amt
2 2005-10-03 Customs and Certain Excise Taxes 127
7 2005-10-03 Estate and Gift Taxes 74
10 2005-10-03 FTD's Received (Table IV) 2515
12 2005-10-03 Individual Income and Employment Taxes, Not Wi... 353
21 2005-10-03 FTD's Received (Table IV) 15708
... ... ... ...
371473 2024-04-04 Taxes - Miscellaneous Excise 17
371474 2024-04-04 Taxes - Non Withheld Ind/SECA Electronic 1905
371475 2024-04-04 Taxes - Non Withheld Ind/SECA Other 1092
371476 2024-04-04 Taxes - Railroad Retirement 1
371477 2024-04-04 Taxes - Withheld Individual/FICA 2871
我想将此数据绘制为堆积条形图。
使用散景实现此功能的好方法是什么?
这是一种方法:
import pandas as pd
import treasury_gov_pandas
from bokeh.plotting import figure, show
from bokeh.models import NumeralTickFormatter, HoverTool
import bokeh.models
import bokeh.palettes
import bokeh.transform
# import matplotlib.pyplot as plt
# import matplotlib
# ----------------------------------------------------------------------
df = treasury_gov_pandas.update_records(
url = 'https://api.fiscaldata.treasury.gov/services/api/fiscal_service/v1/accounting/dts/deposits_withdrawals_operating_cash')
df['record_date'] = pd.to_datetime(df['record_date'])
df['transaction_today_amt'] = pd.to_numeric(df['transaction_today_amt'])
# ----------------------------------------------------------------------
tmp = df[(df['transaction_type'] == 'Deposits') & ((df['transaction_catg'].str.contains('Tax')) | (df['transaction_catg'].str.contains('FTD'))) ]
# tmp.drop(columns=['table_nbr', 'table_nm', 'src_line_nbr', 'record_fiscal_year', 'record_fiscal_quarter', 'record_calendar_year', 'record_calendar_quarter', 'record_calendar_month', 'record_calendar_day', 'transaction_mtd_amt', 'transaction_fytd_amt', 'transaction_catg_desc', 'account_type', 'transaction_type'])
# tmp.tail(20).drop(columns=['table_nbr', 'table_nm', 'src_line_nbr', 'record_fiscal_year', 'record_fiscal_quarter', 'record_calendar_year', 'record_calendar_quarter', 'record_calendar_month', 'record_calendar_day', 'transaction_mtd_amt', 'transaction_fytd_amt', 'transaction_catg_desc', 'account_type', 'transaction_type'])
# ----------------------------------------------------------------------
tmp_agg = tmp.groupby(['record_date', 'transaction_catg'])['transaction_today_amt'].sum().reset_index()
tmp_agg['record_date'] = tmp_agg['record_date'].dt.date
pivot_df = tmp_agg.pivot(index='record_date', columns='transaction_catg', values='transaction_today_amt').fillna(0)
p = figure(title='TGA Taxes', sizing_mode='stretch_both', x_axis_type='datetime', x_axis_label='record_date', y_axis_label='amt')
# p.vbar_stack(stackers=pivot_df.columns, x='record_date', width=0.5, source=pivot_df, legend_label=pivot_df.columns, color=bokeh.palettes.Category20[20])
width = pd.Timedelta(days=0.5)
# p.vbar_stack(stackers=pivot_df.columns, x='record_date', width=0.5, source=pivot_df, color=bokeh.palettes.Category20[15], legend_label=pivot_df.columns.tolist())
p.vbar_stack(stackers=pivot_df.columns, x='record_date', width=width, source=pivot_df, color=bokeh.palettes.Category20[15], legend_label=pivot_df.columns.tolist())
p.xaxis.ticker = bokeh.models.DatetimeTicker(desired_num_ticks=30)
p.legend.click_policy = 'hide'
p.legend.location = 'top_left'
show(p)