如何优化我的 SQLAlchemy 实例?

问题描述 投票:0回答:1

我正在使用 SQLAlchemy 获取大量数据。无需赘述,我们有一个“交付”表,该表链接到其他几个表,然后这些表又链接到更多表。我需要访问所有这些链接表中的数据来创建交付 JSON 文件。

目前,我访问元素的方式运行得非常慢。有人对提高性能有任何建议(具体的或一般的)吗?访问它的代码如下,我将添加所有表定义,但它们相当大。表的定义都是标准的,没有在它们上设置懒惰或任何东西。

# Get the deliveries for the specified date
    delivery_rs = session.query(Delivery).join(Order) \
        .filter(and_(Delivery.DespatchDateTime.between(start_date, end_date), Order.ProductionSite == site_map.get(site))).all()
    # Setup our rowcount for the metadata later
    rowcount = 0

    # Go through each delivery in the resultset, formulate the full job/delivery/client/customer data and add it to the data array
    for delivery in delivery_rs:
        # Add to our rowcount
        rowcount = rowcount + 1

        # Add the jobs to our job array
        job_deliveries = delivery.JobDeliveries
        jobs = []
        quantity = 0
        for job_delivery in job_deliveries:
            job = job_delivery.Job
            web_ref = job.ClientJobReference
            if web_ref and not re.match(r'^CCW_', web_ref):
                web_ref = "" 
            elif web_ref:
                web_ref = re.sub(r'^CCW_', '', web_ref)
            jobs.append({
                "web_ref":      "CCW_{}".format(web_ref) if web_ref else "",
                "name":         job.JobName,
                # The artwork is stored in S3, so provide a link
                "thumbnail":    "https://example.com/{}.png".format(web_ref) if web_ref else ""
            })

            quantity = quantity + job_delivery.Quantity

        # Format our delivery data
        if delivery.AddressContact:
            address_contact = delivery.AddressContact
            contact_data =  {
                        "title":    title_map.get(address_contact.Title),
                        "name":     address_contact.ContactName,
                        "email":    address_contact.ContactEmail,
                        "phone":    address_contact.ContactNumber
                    }
        else:
            contact_data = {}
        
        order = delivery.Order
        client = order.Client
        delivery_method = delivery.DeliveryMethod
        address = delivery.Address
        
        result["data"].append(
            {
                "order_number": order.OrderSequenceId,
                "quantity":     quantity,
                "method":       delivery_method.Name,
                "client":       client.Name,
                "end_client":   client.EndCustomer,
                "jobs":         jobs,
                "contact":      contact_data,
                "address": {
                    "business": address.BusinessName,
                    "postcode": address.PostCode,
                    "town":     address.Town,
                    "county":   address.County,
                    "country":  address.Country.Name,
                    "lines": [
                        address.AddressLine1,
                        address.AddressLine2
                    ]
                }
            }
        )

我还没有尝试那么多,因为那里有很多关于优化 SQLAlchemy 的信息,但我不确定什么对我的场景有实际帮助。收集大约 100 次送货的数据需要 15 秒(大多数都有多个作业)。我确实尝试将每个表连接到查询中,但它没有返回任何数据。

python python-3.x sqlalchemy
1个回答
0
投票

预加载/预加载所有相关数据可能会对性能有很大帮助。否则,每次访问未加载的关系时,都会执行另一个查询。

与加载器选项的关系加载

在您的示例中,它可能看起来像这样(未经测试),使用 SQLAlchemy 2.0

select
样式查询:

    from sqlalchemy.sql import select
    q = select(Delivery).join(Order).where(and_(Delivery.DespatchDateTime.between(start_date, end_date), Order.ProductionSite == site_map.get(site)))
    q = q.options(
        # We are already doing a join, so just use that join to load in the related orders and then join again to get those clients.
        joinedload(Delivery.Order).joinedload(Order.Client)
        # Load all these separately.
        selectinload(Delivery.DeliveryMethod),
        selectinload(Delivery.Address),
        selectinload(Delvery.AddressContact),
        # But here get this association object and the job too.
        selectinload(Delivery.JobDeliveries).joinedload(JobDelivery.Job)
        )
    delivery_rs = session.scalars(q).all()

很难分辨什么是类,什么是关系,什么是值,因为似乎所有东西都是用 CamelCase 命名的,所以我不确定我是否正确。

要检查您是否获得了所有关系,您可以打开

echo=True
并启动脚本。所有查询都应该预先发生,而不是在 for 循环内。您可以在其中添加打印语句或其他内容。

© www.soinside.com 2019 - 2024. All rights reserved.