我有一个数据框,其中包含有关房屋价格的数据。 示例:
df = pd.DataFrame({'num': [1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7],
'date': ['2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01', '2024-01-02', '2024-01-02', '2024-01-02', '2024-01-02', '2024-01-02', '2024-01-02', '2024-01-02'],
'area': [100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100],
'price': [10000000, 10000000, 10000000, 10000000, 10000000, 12000000, 12000000, 11080000, 11090000, 10000000, 10000000, 10000000, 12000000, 12000000],
'price_yest': [10000000, 10000000, 10000000, 10000000, 10000000, 12000000, 12000000, 10000000, 10000000, 10000000, 10000000, 10000000, 12000000, 12000000],
'status': ['cur', 'cur', 'cur', 'cur', 'cur', 'nf_sale', 'nf_sale', 'cur', 'cur', 'cur', 'sold', 'sold', 'new', 'new']})
我需要计算从一天到前一天每平方米的价格变化以及每个因素对这个变化的影响。
影响因素:
计算规则:
cur
)
每天都会有新的场地(new
)。nf_sale
)可以添加然后变成(new
)sold
)房屋不计入价格计算中想要的结果:
date area price avg avg/avg_yest by_price_change by_new_premises by_sale
0 2024-01-01 500 50000000 100000.0 0.0000 0.00 0.00 0.0000
1 2024-01-02 500 56170000 112340.0 0.1234 0.05* 0.04* 0.0334*
*-random
如果有任何帮助,我将不胜感激!
我写了一个例子:
import pandas as pd
import numpy as np
# Create the dataframe
df = pd.DataFrame({'num': [1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7],
'date': ['2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01', '2024-01-02', '2024-01-02', '2024-01-02', '2024-01-02', '2024-01-02', '2024-01-02', '2024-01-02'],
'area': [100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100],
'price': [10000000, 10000000, 10000000, 10000000, 10000000, 12000000, 12000000, 11080000, 11090000, 10000000, 10000000, 10000000, 12000000, 12000000],
'price_yest': [10000000, 10000000, 10000000, 10000000, 10000000, 12000000, 12000000, 10000000, 10000000, 10000000, 10000000, 10000000, 12000000, 12000000],
'status': ['cur', 'cur', 'cur', 'cur', 'cur', 'nf_sale', 'nf_sale', 'cur', 'cur', 'cur', 'sold', 'sold', 'new', 'new']})
# Convert date to datetime for operations
df['date'] = pd.to_datetime(df['date'])
# Filter out only 'cur' and 'new' for calculation
df_filtered = df[df['status'].isin(['cur', 'new'])]
# Calculate the daily total price and total area
daily_totals = df_filtered.groupby('date').agg({'area': 'sum', 'price': 'sum'}).reset_index()
# Calculate average price per square meter
daily_totals['avg'] = daily_totals['price'] / daily_totals['area']
# Calculate change in average price per square meter
daily_totals['avg_change'] = daily_totals['avg'].pct_change()
daily_totals['avg_change'] = daily_totals['avg_change'].fillna(0)
# Random for by_price_change, by_new_premises, by_sale
daily_totals['by_price_change'] = np.random.rand(len(daily_totals)) * 0.1
daily_totals['by_new_premises'] = np.random.rand(len(daily_totals)) * 0.1
daily_totals['by_sale'] = np.random.rand(len(daily_totals)) * 0.1
print(daily_totals)