featuretools 对时间测量的基本聚合

问题描述 投票:0回答:0

我正在使用featuretools(1.1x版本),我阅读了文档,并且还在这里搜索了

但仍然很难找到如何做简单的事情,比如 SELECT MIN(datetime_field_1)..

我还检查了list_primitives()那些与时间相关的似乎不是我需要的,

我可以对数字字段执行此操作,但似乎无法在日期时间字段上执行此操作..

https://featuretools.alteryx.com/en/stable/

我只是想通过 customer_id 获取 min(timestamp), max(timestamp) 组,但 max/min 原语仅适用于数字

import featuretools as ft
import pandas as pd
import numpy as np

# make some random data
n = 100
events_df = pd.DataFrame({
    "id" : range(n),
    "customer_id": np.random.choice(["a", "b", "c"], n),
    "timestamp": pd.date_range("Jan 1, 2019", freq="1h", periods=n),
    "amount": np.random.rand(n) * 100 
})

def to_part_of_day(x):
    if x < 12:
        return "morning"
    elif x < 18:
        return "afternoon"
    else:
        return "evening"
es = ft.EntitySet(id='my_set')
es = es.add_dataframe(dataframe = events_df, dataframe_name = 'events', time_index='timestamp', index='index')
feature_matrix, feature_defs = ft.dfs(
  entityset=es,
  target_dataframe_name='events',
  agg_primitives=['min','max'],
  trans_primitive=[],
  primitive_options={
  'max':{
        "include_groupby_columns":{"events":["customer_id"]}
        }
  }

)


如何获取每个 customer_id 的 max(amount), max(timestamp)?谢谢!在阅读了 featuretools.alteryx.com 及其 github 示例后问这样的基本问题感觉很愚蠢..

feature-extraction feature-engineering featuretools
© www.soinside.com 2019 - 2024. All rights reserved.