出于性能考虑,我想将一些基于 panda 的脚本转换为 Polars。我需要执行分组并根据日期时间值计算半衰期。不幸的是,我无法真正找到一本关于极地的食谱,并依靠这个答案开始并达到以下近似值:
import pandas as pd
import polars as pl
import random
from datetime import datetime, timedelta
# Define the list of persons
persons = ['Person A', 'Person B', 'Person C', 'Person D']
# Generate random data for the DataFrame
# start_date = datetime.now() - timedelta(days=365)
df = pd.DataFrame(
{'person': [random.choice(persons) for _ in range(50)],
'rating': [random.randint(75, 110) for _ in range(50)],
'date' : [datetime(2022, 6, 1, 0, 0, 0)
+ timedelta(days=random.randint(0, 365))
for _ in range(50)]}
)
df.sort_values(['date'], inplace=True)
# To be used with polars
dl = pl.from_dataframe(df)
# Function to convert
df['EWM_30d'] = df.groupby(
by='person', sort=False).apply(
lambda x: x['rating'].ewm(halflife=('30d'), times=x['date']
).mean().shift(1, fill_value=80).round(2)
).to_numpy()
# Initial polars version
dl = dl.rolling(
'date', by='person', period="100000d").agg(
pl.col('rating').ewm_mean(half_life=30).shift(1, fill_value=80).last().alias('EWM_30d'))
我已经成功地做了一些类似的事情,但它有几个缺陷: