计算各种因素对最终变化的影响

问题描述 投票:0回答:1

我有一个数据框,其中包含有关房屋价格的数据。 示例:

df = pd.DataFrame({'num': [1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7],
                   'date': ['2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01',  '2024-01-02', '2024-01-02', '2024-01-02', '2024-01-02', '2024-01-02', '2024-01-02', '2024-01-02'],
                   'area': [100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100],
                   'price': [10000000, 10000000, 10000000, 10000000, 10000000, 12000000, 12000000, 11080000, 11090000, 10000000, 10000000, 10000000, 12000000, 12000000],
                   'price_yest': [10000000, 10000000, 10000000, 10000000, 10000000, 12000000, 12000000, 10000000, 10000000, 10000000, 10000000, 10000000, 12000000, 12000000],
                   'status': ['cur', 'cur', 'cur', 'cur', 'cur', 'nf_sale', 'nf_sale', 'cur', 'cur', 'cur', 'sold', 'sold', 'new', 'new']})

我需要计算从一天到前一天每平方米的价格变化以及每个因素对这个变化的影响。

影响因素:

  1. 各场所价格变动
  2. 增加新场地
  3. 销售

计算规则:

  • 计算每平方米均价,仅当前(
    cur
    ) 每天都会有新的场地(
    new
    )。
  • 曾经的场所 非卖品(
    nf_sale
    )可以添加然后变成(
    new
    )
  • 已售(
    sold
    )房屋不计入价格计算中

想要的结果:

     date      area price       avg      avg/avg_yest  by_price_change by_new_premises  by_sale
0   2024-01-01  500 50000000    100000.0    0.0000         0.00              0.00     0.0000
1   2024-01-02  500 56170000    112340.0    0.1234         0.05*             0.04*    0.0334*
*-random

如果有任何帮助,我将不胜感激!

python pandas math
1个回答
0
投票

我写了一个例子:

import pandas as pd
import numpy as np

# Create the dataframe
df = pd.DataFrame({'num': [1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7],
                   'date': ['2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01',  '2024-01-02', '2024-01-02', '2024-01-02', '2024-01-02', '2024-01-02', '2024-01-02', '2024-01-02'],
                   'area': [100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100],
                   'price': [10000000, 10000000, 10000000, 10000000, 10000000, 12000000, 12000000, 11080000, 11090000, 10000000, 10000000, 10000000, 12000000, 12000000],
                   'price_yest': [10000000, 10000000, 10000000, 10000000, 10000000, 12000000, 12000000, 10000000, 10000000, 10000000, 10000000, 10000000, 12000000, 12000000],
                   'status': ['cur', 'cur', 'cur', 'cur', 'cur', 'nf_sale', 'nf_sale', 'cur', 'cur', 'cur', 'sold', 'sold', 'new', 'new']})

# Convert date to datetime for operations
df['date'] = pd.to_datetime(df['date'])

# Filter out only 'cur' and 'new' for calculation
df_filtered = df[df['status'].isin(['cur', 'new'])]

# Calculate the daily total price and total area
daily_totals = df_filtered.groupby('date').agg({'area': 'sum', 'price': 'sum'}).reset_index()

# Calculate average price per square meter
daily_totals['avg'] = daily_totals['price'] / daily_totals['area']

# Calculate change in average price per square meter
daily_totals['avg_change'] = daily_totals['avg'].pct_change()
daily_totals['avg_change'] = daily_totals['avg_change'].fillna(0)

# Random for by_price_change, by_new_premises, by_sale
daily_totals['by_price_change'] = np.random.rand(len(daily_totals)) * 0.1
daily_totals['by_new_premises'] = np.random.rand(len(daily_totals)) * 0.1
daily_totals['by_sale'] = np.random.rand(len(daily_totals)) * 0.1

print(daily_totals)
© www.soinside.com 2019 - 2024. All rights reserved.