Polars：根据另一列从列中删除子字符串

Question

是否有任何基于 Polars 的优化可以应用于本文中的 apply-lambda 方法从基于另一列的列中删除子字符串？

在下面的极坐标数据框中，我们如何根据

sub

的值删除“_

sub

”子字符串？

import polars as pl

pl.DataFrame(
    {"origin": ["id1_COUNTRY", "id2_NAME"],
     "sub": ["COUNTRY", "NAME"]}
)

shape: (2, 2)
┌─────────────┬─────────┐
│ origin      ┆ sub     │
│ ---         ┆ ---     │
│ str         ┆ str     │
╞═════════════╪═════════╡
│ id1_COUNTRY ┆ COUNTRY │
│ id2_NAME    ┆ NAME    │
└─────────────┴─────────┘

预期输出应如下所示：

shape: (2, 3)
┌─────────────┬─────────┬─────┐
│ origin      ┆ sub     ┆ out │
│ ---         ┆ ---     ┆ --- │
│ str         ┆ str     ┆ str │
╞═════════════╪═════════╪═════╡
│ id1_COUNTRY ┆ COUNTRY ┆ id1 │
│ id2_NAME    ┆ NAME    ┆ id2 │
└─────────────┴─────────┴─────┘

Answer 1

第一个明显要尝试的事情是

.str.replace

，但它仅在图案长度相同时才有效。由于您只想删除该字符串，并且至少在本示例中，您要删除的子字符串位于您要保留的内容之后，因此您可以使用

.str.split()

，然后使用

.list.first()

(
    pl.DataFrame(
    {"origin": ["id1_COUNTRY", "id2_NAME"],
     "sub": ["COUNTRY", "NAME"]}
)
    .with_columns(
        out=pl.col('origin')
        .str.split(pl.lit('_')+pl.col('sub')).list.first()
        )
    )
shape: (2, 3)
┌─────────────┬─────────┬─────┐
│ origin      ┆ sub     ┆ out │
│ ---         ┆ ---     ┆ --- │
│ str         ┆ str     ┆ str │
╞═════════════╪═════════╪═════╡
│ id1_COUNTRY ┆ COUNTRY ┆ id1 │
│ id2_NAME    ┆ NAME    ┆ id2 │
└─────────────┴─────────┴─────┘

当然，在这种情况下，你可以直接用

来分割。

Polars：根据另一列从列中删除子字符串

问题描述投票：0回答：1

1个回答

最新问题

Polars：根据另一列从列中删除子字符串

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1