SQLAlchemy - 选择具有相同标识引用的行,但其值在不同序列周期中有所不同

问题描述 投票:0回答:1

我有一个具有以下结构的表格(这是缩写的):

class Valuation(Base):
    __tablename__ = 'valuation'
    id = Column(Integer, primary_key=True)
    reference = Column(BigInteger, index=True)
    value = Column(Float)
    period = Column(String)

示例数据:

参考 价值
2433 110 2023-a
5435 120 2023-b
5435 110 2022-a
2433 100 2022-b
5435 105 2022-c
2433 100 2021-a

数据注意事项:

  • 参考文献并不总是对每个不同的周期序列(年份字符)进行值测量。参考可能没有一个周期的最新测量序列的值。
  • value
    应随时间减少或保持不变,因此任何周期的最大值应小于之前周期的最大值。

我想选择每个参考,其中该参考的最近周期值大于任何先前周期的任何最新值。

在上面,将返回:

参考 价值
2433 110 2023-a
5435 120 2023-b

回顾this,表明

aliased
方法会有所帮助,但我对如何最好地构建它有点不知所措。

到目前为止我在哪里:

value2022 = aliased(Valuation, name="value2022")
value2021 = aliased(Valuation, name="value2021")
query = (
    db.query(Valuation)
    .outerjoin(value2022, (
            (Valuation.reference == value2022.reference)
            & (Valuation.value > value2022.value)
            & (Valuation.period.startswith("2023"))
            & (value2022.period.startswith("2022"))
        )
    )
    .outerjoin(value2021, (
            (Valuation.reference == value2021.reference)
            & (Valuation.value > value2021.value)
            & (Valuation.period.startswith("2023"))
            & (value2021.period.startswith("2021"))
        )
    )
    .order_by(
        Valuation.reference,
        Valuation.period.desc(),
    )
    .distinct(Valuation.reference)
    .all()
)

但是,这并没有给我最新时期的值与之前每个时期的最新值的比较,并且似乎严重过度拟合。这可以吗?

python sqlalchemy
1个回答
0
投票

我已经很长时间没有使用

sqlalchemy
了,但也许这会有用。

from sqlalchemy import create_engine, Column, Integer, BigInteger, Float, String, text
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker

import pandas as pd

engine = create_engine('sqlite://')
Base = declarative_base(engine)
Session = sessionmaker(bind=engine)
session = Session()


class Valuation(Base):
    __tablename__ = 'valuation'
    id = Column(Integer, primary_key=True)
    reference = Column(BigInteger, index=True)
    value = Column(Float)
    period = Column(String)


Base.metadata.create_all(engine)
for valuation in [
    Valuation(reference=2433, value=110, period='2023-a'),
    Valuation(reference=5435, value=120, period='2023-b'),
    Valuation(reference=5435, value=110, period='2022-a'),
    Valuation(reference=2433, value=100, period='2022-b'),
    Valuation(reference=5435, value=105, period='2022-c'),
    Valuation(reference=2433, value=100, period='2021-a'),
]:
    session.add(valuation)

session.commit()

with engine.connect() as con:
    cursor = con.execute(text("""
        -- max by reference
        WITH r AS (
            SELECT reference,
                   max(value) AS value
              FROM valuation
             GROUP BY reference
        )
        
        SELECT r.*, p.period
          FROM r
          JOIN (
              SELECT reference, 
                     period,
                     value
                FROM valuation
          ) AS p ON (p.reference = r.reference AND p.value = r.value)
    """))

    print('result using sql:')
    print(cursor.all())
    # also you can use pandas

    df = pd.read_sql_query("""
        SELECT reference, 
               period,
               max(value) AS value
          FROM valuation
        GROUP BY reference, period
        ORDER BY value DESC
    """, con=con.connection)

    print('\ndataframe from database\n')
    print(df)
    print('\ndataframe after deduplication\n')
    print(df.drop_duplicates(['reference']))

我们奔跑吧:

result using sql:
[(2433, 110.0, '2023-a'), (5435, 120.0, '2023-b')]

dataframe from database

   reference  period  value
0       5435  2023-b  120.0
1       2433  2023-a  110.0
2       5435  2022-a  110.0
3       5435  2022-c  105.0
4       2433  2021-a  100.0
5       2433  2022-b  100.0

dataframe after deduplication

   reference  period  value
0       5435  2023-b  120.0
1       2433  2023-a  110.0

Process finished with exit code 0
© www.soinside.com 2019 - 2024. All rights reserved.