polars python 收集列表

问题描述 投票:0回答:1

我正在尝试 group_by 并使用极坐标查找相同数据具有不同 id 的组。我收到的错误是:错误:“Expr”对象没有属性“collect_list”。这是我的代码:

import polars as pl
from datetime import datetime

# Create a sample DataFrame with detailed personal data where all fields are the same except the client ID
df = pl.DataFrame({
    "PER_FirstName": ["John", "John", "John"],
    "PER_LastName": ["Doe", "Doe", "Doe"],
    "PER_DOB": [datetime(1990, 5, 1), datetime(1990, 5, 1), datetime(1990, 5, 1)],
    "PER_StreetAddress": ["123 Elm St", "123 Elm St", "123 Elm St"],
    "PER_ClientID": [101, 102, 103]
})


# Using group_by and trying a potentially available function.
try:
    result = df.group_by(['PER_FirstName', 'PER_LastName', 'PER_DOB', 'PER_StreetAddress']) \
               .agg([
                   pl.col('PER_ClientID').n_unique().alias('unique_client_ids'),
                   pl.col('PER_ClientID').collect_list().alias('client_ids')  # Trying collect_list
               ])
    print(result)
except AttributeError as e:
    print("Error:", e)

关于我做错了什么以及如何改变它以使其正常工作有什么想法吗?谢谢!

string aggregate-functions
1个回答
0
投票

我最终使用了 Polar 和 pandas 的组合来解决这个问题。显然,目前没有极地设施可以按照我想要的方式收集东西,所以我采用了混合方法,如下所示:

import polars as pl
from datetime import datetime
import pandas as pd
from itables import show

# Aggregate to collect unique client IDs and count them
result = df.groupby(['PER_FirstName', 'PER_LastName', 'PER_DOB', 'PER_StreetAddress']) \
           .agg([
               pl.col('PER_ClientID').unique().alias('unique_client_ids'),
               pl.col('PER_ClientID').n_unique().alias('count_unique_ids')
           ])

# Filter groups with exactly two unique client IDs
filtered_result = result.filter(pl.col('count_unique_ids') > 1)

# Convert Polars DataFrame to Pandas DataFrame
filtered_result_pd = filtered_result.to_pandas()

# Display the result using itables
#show(filtered_result_pd)

from itables import init_notebook_mode
from itables import show

init_notebook_mode(all_interactive=True)

show(filtered_result_pd, classes="display nowrap compact",paging=False, buttons=["copyHtml5", "csvHtml5", "excelHtml5"], scrollX = True)
© www.soinside.com 2019 - 2024. All rights reserved.