在极坐标中选择所有以图案结尾的列并添加不带图案的新列

Question

我有以下数据框：

import polars as pl
import numpy as np

df = pl.DataFrame({
    "nrs": [1, 2, 3, None, 5],
    "names_A0": ["foo", "ham", "spam", "egg", None],
    "random_A0": np.random.rand(5),
    "A_A2": [True, True, False, False, False],
})
digit = 0

对于名称以字符串

suf =f'_A{digit}'

结尾的每个列 X，我想向

df

添加一个相同的列，其名称与 X 相同，但没有

suf

。

在示例中，我需要将列

names

和

random

添加到原始数据框

df

，其内容分别与列

names_A0

和

random_A0

相同。

Answer 1

您可以使用极坐标选择器以及一些基本的字符串操作来完成此操作。根据您对问题发展的预期，您可以直接跳到正则表达式，或使用 Polars.selectors.ends_with/string.removesuffix

字符串后缀操作

此方法使用

- polars.selectors.ends_with # find columns ending with string
- string.removesuffix        # remove suffix from end of string

翻译为

import polars as pl
from polars import selectors as cs
import numpy as np
import re
from functools import partial

df = pl.DataFrame(
    {
        "nrs": [1, 2, 3, None, 5],
        "names_A0": ["foo", "ham", "spam", "egg", None],
        "random_A0": np.random.rand(5),
        "A_A2": [True, True, False, False, False],
    }
)
digit = 0
suffix = f'_A{digit}'

print(
    # keep original A0 columns
    df.with_columns(
        cs.ends_with(suffix).name.map(lambda s: s.removesuffix(suffix))
    ),
    # shape: (5, 6)
    # ┌──────┬──────────┬───────────┬───────┬───────┬──────────┐
    # │ nrs  ┆ names_A0 ┆ random_A0 ┆ A_A2  ┆ names ┆ random   │
    # │ ---  ┆ ---      ┆ ---       ┆ ---   ┆ ---   ┆ ---      │
    # │ i64  ┆ str      ┆ f64       ┆ bool  ┆ str   ┆ f64      │
    # ╞══════╪══════════╪═══════════╪═══════╪═══════╪══════════╡
    # │ 1    ┆ foo      ┆ 0.713324  ┆ true  ┆ foo   ┆ 0.713324 │
    # │ 2    ┆ ham      ┆ 0.980031  ┆ true  ┆ ham   ┆ 0.980031 │
    # │ 3    ┆ spam     ┆ 0.242768  ┆ false ┆ spam  ┆ 0.242768 │
    # │ null ┆ egg      ┆ 0.528783  ┆ false ┆ egg   ┆ 0.528783 │
    # │ 5    ┆ null     ┆ 0.583206  ┆ false ┆ null  ┆ 0.583206 │
    # └──────┴──────────┴───────────┴───────┴───────┴──────────┘


    # drop original A0 columns
    df.select(
        ~cs.ends_with(suffix),
        cs.ends_with(suffix).name.map(lambda s: s.removesuffix(suffix))
    ),
    # shape: (5, 4)
    # ┌──────┬───────┬───────┬──────────┐
    # │ nrs  ┆ A_A2  ┆ names ┆ random   │
    # │ ---  ┆ ---   ┆ ---   ┆ ---      │
    # │ i64  ┆ bool  ┆ str   ┆ f64      │
    # ╞══════╪═══════╪═══════╪══════════╡
    # │ 1    ┆ true  ┆ foo   ┆ 0.713324 │
    # │ 2    ┆ true  ┆ ham   ┆ 0.980031 │
    # │ 3    ┆ false ┆ spam  ┆ 0.242768 │
    # │ null ┆ false ┆ egg   ┆ 0.528783 │
    # │ 5    ┆ false ┆ null  ┆ 0.583206 │
    # └──────┴───────┴───────┴──────────┘

    sep='\n\n'
)

正则表达式

或者，您可以使用正则表达式来检测一系列后缀模式

- polars.selectors.matches  # find columns matching a pattern
- re.sub                    # substitute in string based on pattern

我们需要确保我们的图案以

'$'

结尾来锚定图案到字符串末尾。

import polars as pl
from polars import selectors as cs
import numpy as np
import re
from functools import partial

df = pl.DataFrame(
    {
        "nrs": [1, 2, 3, None, 5],
        "names_A0": ["foo", "ham", "spam", "egg", None],
        "random_A0": np.random.rand(5),
        "A_A2": [True, True, False, False, False],
    }
)
digit=0
suffix = fr'_A{digit}$'

print(
    # keep original A0 columns
    df.with_columns(
        cs.matches(suffix).name.map(lambda s: re.sub(suffix, '', s))
    ),
    # shape: (5, 6)
    # ┌──────┬──────────┬───────────┬───────┬───────┬──────────┐
    # │ nrs  ┆ names_A0 ┆ random_A0 ┆ A_A2  ┆ names ┆ random   │
    # │ ---  ┆ ---      ┆ ---       ┆ ---   ┆ ---   ┆ ---      │
    # │ i64  ┆ str      ┆ f64       ┆ bool  ┆ str   ┆ f64      │
    # ╞══════╪══════════╪═══════════╪═══════╪═══════╪══════════╡
    # │ 1    ┆ foo      ┆ 0.713324  ┆ true  ┆ foo   ┆ 0.713324 │
    # │ 2    ┆ ham      ┆ 0.980031  ┆ true  ┆ ham   ┆ 0.980031 │
    # │ 3    ┆ spam     ┆ 0.242768  ┆ false ┆ spam  ┆ 0.242768 │
    # │ null ┆ egg      ┆ 0.528783  ┆ false ┆ egg   ┆ 0.528783 │
    # │ 5    ┆ null     ┆ 0.583206  ┆ false ┆ null  ┆ 0.583206 │
    # └──────┴──────────┴───────────┴───────┴───────┴──────────┘


    # drop original A0 columns
    df.select(
        ~cs.matches(suffix),
        cs.matches(suffix).name.map(lambda s: re.sub(suffix, '', s))
    ),
    # shape: (5, 4)
    # ┌──────┬───────┬───────┬──────────┐
    # │ nrs  ┆ A_A2  ┆ names ┆ random   │
    # │ ---  ┆ ---   ┆ ---   ┆ ---      │
    # │ i64  ┆ bool  ┆ str   ┆ f64      │
    # ╞══════╪═══════╪═══════╪══════════╡
    # │ 1    ┆ true  ┆ foo   ┆ 0.713324 │
    # │ 2    ┆ true  ┆ ham   ┆ 0.980031 │
    # │ 3    ┆ false ┆ spam  ┆ 0.242768 │
    # │ null ┆ false ┆ egg   ┆ 0.528783 │
    # │ 5    ┆ false ┆ null  ┆ 0.583206 │
    # └──────┴───────┴───────┴──────────┘

    sep='\n\n'
)

Answer 2

您可以使用 Polars 的列选择器选择相应的列，然后使用

.name.map

重命名选择器表达式的输出。

import polars.selectors as cs

df.with_columns(cs.matches(f"_A{digit}$").name.map(lambda name: name[:-3]))

shape: (5, 6)
┌──────┬──────────┬───────────┬───────┬───────┬──────────┐
│ nrs  ┆ names_A0 ┆ random_A0 ┆ A_A2  ┆ names ┆ random   │
│ ---  ┆ ---      ┆ ---       ┆ ---   ┆ ---   ┆ ---      │
│ i64  ┆ str      ┆ f64       ┆ bool  ┆ str   ┆ f64      │
╞══════╪══════════╪═══════════╪═══════╪═══════╪══════════╡
│ 1    ┆ foo      ┆ 0.626253  ┆ true  ┆ foo   ┆ 0.626253 │
│ 2    ┆ ham      ┆ 0.480437  ┆ true  ┆ ham   ┆ 0.480437 │
│ 3    ┆ spam     ┆ 0.789309  ┆ false ┆ spam  ┆ 0.789309 │
│ null ┆ egg      ┆ 0.126665  ┆ false ┆ egg   ┆ 0.126665 │
│ 5    ┆ null     ┆ 0.522989  ┆ false ┆ null  ┆ 0.522989 │
└──────┴──────────┴───────────┴───────┴───────┴──────────┘

注意。 在上面的示例中，我们选择名称包含字符串

"_A"

、后跟

digit

、后跟字符串结尾 (

) 的所有列。由于后缀保证长度为 3，因此新名称等于原始名称，去掉最后 3 个字母。

在极坐标中选择所有以图案结尾的列并添加不带图案的新列

问题描述投票：0回答：2

2个回答

字符串后缀操作

正则表达式

最新问题

在极坐标中选择所有以图案结尾的列并添加不带图案的新列

问题描述 投票：0回答：2

2个回答

字符串后缀操作

正则表达式

最新问题

问题描述投票：0回答：2