假设我有以下 Polars DataFrame:
all_items = pl.DataFrame(
{
"ISO_codes": ["fin", "nor", "eng", "eng", "swe"],
"ISO_codes1": ["fin", "nor", "eng", "eng", "eng"],
"ISO_codes2": ["fin", "ice", "eng", "eng", "eng"],
"OtherColumn": ["1", "2", "3", "4", "5"],
})
如何实现像 check_for_equality 这样的方法
def check_for_equality(all_items, columns_to_check_for_equality):
return all_items.with_columns(
pl.col_equals(columns_to_check_for_equality).alias("ISO_EQUALS")
)
所以当我调用它时:
columns_to_check_for_equality = ["ISO_codes", "ISO_codes1", "ISO_codes2"]
resulting_df = check_for_equality(all_items, columns_to_check_for_equality)
我实现了以下目标:
resulting_df == pl.DataFrame(
{
"ISO_codes": ["fin", "nor", "eng", "eng", "swe"],
"ISO_codes1": ["fin", "nor", "eng", "eng", "eng"],
"ISO_codes2": ["fin", "ice", "eng", "eng", "eng"],
"OtherColumn": ["1", "2", "3", "4", "5"],
"ISO_EQUALS": [True, False, True, True, False],
})
请注意,在进行实际检查时我并不“知道”列名称,并且调用之间的列数可能会有所不同。
Polars API 中有类似“col_equals”的东西吗?
all_horizontal
: 进行聚合
all_items.with_columns(pl.all_horizontal(
pl.col(columns_to_check_for_equality)
.eq(pl.col(columns_to_check_for_equality[0]))
).alias('ISO_EQUALS')
)
输出:
┌───────────┬────────────┬────────────┬─────────────┬────────────┐
│ ISO_codes ┆ ISO_codes1 ┆ ISO_codes2 ┆ OtherColumn ┆ ISO_EQUALS │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str ┆ bool │
╞═══════════╪════════════╪════════════╪═════════════╪════════════╡
│ fin ┆ fin ┆ fin ┆ 1 ┆ true │
│ nor ┆ nor ┆ ice ┆ 2 ┆ false │
│ eng ┆ eng ┆ eng ┆ 3 ┆ true │
│ eng ┆ eng ┆ eng ┆ 4 ┆ true │
│ swe ┆ eng ┆ eng ┆ 5 ┆ false │
└───────────┴────────────┴────────────┴─────────────┴────────────┘