有没有办法在Python中从公共域检索搜索结果

问题描述 投票:0回答:1

看着这样的东西:

https://disclosures-clerk.house.gov/FinancialDisclosure

使用左侧框中的“搜索”功能,我想在“归档年份”下拉列表中选择年份,并在 Python 中检索结果中超链接到的 PDF。

例如,对于 2024 年,我想检索返回的 140 个条目所链接的 PDF。理想情况下,我还可以根据“归档”进行过滤。有什么办法可以做到这一点吗?

python html python-requests
1个回答
0
投票

尝试:

import requests
from bs4 import BeautifulSoup

data = {
    "LastName": "",
    "FilingYear": "2022",  # <-- change year here
    "State": "",
    "District": "",
}

api_url = (
    "https://disclosures-clerk.house.gov/FinancialDisclosure/ViewMemberSearchResult"
)

soup = BeautifulSoup(requests.post(api_url, data=data).content, "html.parser")

for a in soup.select('a[href$=".pdf"]'):
    print(a.text, a["href"])

打印:


...

Wittman, Hon.. Robert J.  public_disc/ptr-pdfs/2022/20021150.pdf
Wittman, Hon.. Robert J.  public_disc/ptr-pdfs/2022/20021344.pdf
Wittman, Hon.. Robert J.  public_disc/ptr-pdfs/2022/20021515.pdf
Wittman, Hon.. Robert J.  public_disc/ptr-pdfs/2022/20021679.pdf
Wittman, Hon.. Robert J.  public_disc/ptr-pdfs/2022/20021807.pdf
Wittman, Hon.. Robert J.  public_disc/ptr-pdfs/2022/20022101.pdf
Wittman, Hon.. Robert J.  public_disc/financial-pdfs/2022/30018513.pdf
Womack, Hon.. Steve  public_disc/financial-pdfs/2022/10054531.pdf
Womack, Hon.. Steve  public_disc/ptr-pdfs/2022/20022049.pdf
Yakym, Hon.. Rudy III. public_disc/financial-pdfs/2022/10052905.pdf
Yakym, Hon.. Rudy III. public_disc/ptr-pdfs/2022/20022181.pdf
Yakym, Hon.. Rudy III. public_disc/financial-pdfs/2022/30018183.pdf
Zinke, Hon.. Ryan K.  public_disc/financial-pdfs/2022/10053424.pdf
© www.soinside.com 2019 - 2024. All rights reserved.