浏览所有页面

问题描述 投票:0回答:1

我正在制作一个免费的寻人应用程序。它将按姓名查找德克萨斯州的所有人,并找到他们的家庭住址。我刚刚开始,似乎无法弄清楚如何循环浏览所有页面。有没有好的、可靠的方法来做到这一点?我只是个傻子吗?这是我的第一个大项目,我确实需要一些帮助。谢谢!

import json
from time import sleep

name = input("Enter full name: ")

def namesearchwilcounty(name):
    url = "https://search.wcad.org/ProxyT/Search/Properties/quick/"
    params = {
        "f": name,
        "pn": 1,
        "st": 4,
        "so": "desc",
        "pt": "RP;PP;MH;NR",
        "ty": "2024"
    }

    response = requests.get(url, params=params)

    if response.status_code == 200:
        data = response.json()
        if data and "ResultList" in data and data["ResultList"]:
            print("Situs Addresses and Owner Names:")
            processed_results = set()  # Set to store processed results
            for item in data["ResultList"]:
                situs_address = item["SitusAddress"]
                owner_name = item["OwnerName"]
                total_page_count = item.get("TotalPageCount", 1)
                current_page = 1
                while current_page <= total_page_count:
                    params["pn"] = current_page  # Update the page number in params
                    response = requests.get(url, params=params)
                    if response.status_code == 200:
                        page_data = response.json()
                        for item in page_data["ResultList"]:
                            situs_address = item["SitusAddress"]
                            owner_name = item["OwnerName"]
                            result = f"{owner_name} {situs_address}"
                            if result not in processed_results:
                                print(f"Page {current_page}: {result}")
                                processed_results.add(result)  # Add result to the set
                        current_page += 1
                        sleep(1)  # Add a delay between requests
                    else:
                        print(f"Error: {response.status_code} - {response.text}")
                        break
        else:
            print("No data found in the response.")
    else:
        print(f"Error: {response.status_code} - {response.text}")


namesearchwilcounty(name)
python search python-requests
1个回答
0
投票

似乎pn参数是页码(基数为1),并且当RecordCount为零时没有更多页面可用。

因此,简化您的代码,您可以这样做:

import requests
import time

URL = "https://search.wcad.org/ProxyT/Search/Properties/quick/"

def search(name):
    params = {
        "f": name,
        "pn": 1,
        "st": 4,
        "so": "desc",
        "pt": "RP;PP;MH;NR",
        "ty": "2024"
    }
    with requests.Session() as session:
        while True:
            response = session.get(URL, params=params)
            response.raise_for_status()
            data = response.json()
            if data["RecordCount"] == 0:
                print("No more records")
                break
            current_page = data["CurrentPage"]
            print(current_page)
            params["pn"] = current_page + 1
            time.sleep(1) # try to avoid HTTP 429

search("Smith")

注:

如果这需要真正强大,您需要开发某种重试机制,因为在任何硬编码期间休眠不太可能保证您在某个时刻不会收到 HTTP 429

© www.soinside.com 2019 - 2024. All rights reserved.