如何在Python中取消极坐标数据框的分组？

Question

我有一个极坐标数据框，其中有一个带有重复模式的特定列。我已按模式对它们进行分组，并在该分组数据框中添加一个新列。但现在我必须解压/取消分组这个数据框。我怎样才能在极地做到这一点？

我的原始数据框如下所示：

文件	col1	col2
A	细胞1	2 号细胞
B	细胞3	4 号细胞
A	5 号细胞	6 号细胞
B	7 号细胞	8 号细胞

我执行了 groupby 按 FILE 对数据帧进行分组，然后添加了所需的新列，得到了以下输出。

文件	col1	col2	文件夹
A	[单元格 1、单元格 5]	[单元格 2、单元格 6]	[文件1，文件2]
B	[单元格 3、单元格 7]	[单元格 4、单元格 8]	[文件1，文件2]

现在我想将上述数据框取消分组为原始格式，同时还包括这个新列。我该怎么做？我的实际数据框很大并且有很多行和列，使用迭代效率不高并且速度很慢。是否有任何函数可以应用于整个数据框而不是按列迭代？

最终期望的输出：

文件	标题 1	标题 2	文件夹
A	细胞1	2 号细胞	文件1
B	细胞3	4 号细胞	文件1
A	5 号细胞	6 号细胞	文件2
B	7 号细胞	8 号细胞	文件2

我已完成以下操作：

dfg = df.groupby('FILE').agg(pl.all())             #to group them first time 
newdf =  dfg.with_columns(pl.repeat([file1,file2,file3], dfg.height)    #adding desired column

通过哪些有效方式可以获得所需的输出？请注意，我的数据框非常大，因此按列使用迭代非常耗时。

PS - 更新了决赛桌格式中的拼写错误。在“文件”列中，由于条目在几行后重复，因此应该为它们分配一个新的“文件夹”名称。

Answer 1

看起来您正在尝试“枚举”每个组。

您可以使用

.cumcount()

来实现。

df = pl.from_repr("""
┌──────┬─────────┬──────────┐
│ file ┆ col1    ┆ col2     │
│ ---  ┆ ---     ┆ ---      │
│ str  ┆ str     ┆ str      │
╞══════╪═════════╪══════════╡
│ A    ┆ cell 1  ┆ cell 2   │
│ B    ┆ cell 3  ┆ cell 4   │
│ A    ┆ cell 5  ┆ cell 6   │
│ B    ┆ cell 7  ┆ cell 8   │
│ A    ┆ cell 9  ┆ cell 10  │
│ B    ┆ cell 11 ┆ cell 12  │
│ A    ┆ cell 13 ┆ cell 14  │
│ B    ┆ cell 15 ┆ cell 16  │
│ A    ┆ cell 17 ┆ cell 18  │
│ B    ┆ cell 19 ┆ cell 20  │
└──────┴─────────┴──────────┘
""")

df.with_columns(folder = 
   pl.col("file").cumcount().over("file")
)

shape: (10, 4)
┌──────┬─────────┬─────────┬────────┐
│ file ┆ col1    ┆ col2    ┆ folder │
│ ---  ┆ ---     ┆ ---     ┆ ---    │
│ str  ┆ str     ┆ str     ┆ u32    │
╞══════╪═════════╪═════════╪════════╡
│ A    ┆ cell 1  ┆ cell 2  ┆ 0      │
│ B    ┆ cell 3  ┆ cell 4  ┆ 0      │
│ A    ┆ cell 5  ┆ cell 6  ┆ 1      │
│ B    ┆ cell 7  ┆ cell 8  ┆ 1      │
│ A    ┆ cell 9  ┆ cell 10 ┆ 2      │
│ B    ┆ cell 11 ┆ cell 12 ┆ 2      │
│ A    ┆ cell 13 ┆ cell 14 ┆ 3      │
│ B    ┆ cell 15 ┆ cell 16 ┆ 3      │
│ A    ┆ cell 17 ┆ cell 18 ┆ 4      │
│ B    ┆ cell 19 ┆ cell 20 ┆ 4      │
└──────┴─────────┴─────────┴────────┘

您可以使用模算术将其变成“重复序列”。

df.with_columns(folder = 
   pl.col("file").cumcount().over("file").mod(3)
)

shape: (10, 4)
┌──────┬─────────┬─────────┬────────┐
│ file ┆ col1    ┆ col2    ┆ folder │
│ ---  ┆ ---     ┆ ---     ┆ ---    │
│ str  ┆ str     ┆ str     ┆ u32    │
╞══════╪═════════╪═════════╪════════╡
│ A    ┆ cell 1  ┆ cell 2  ┆ 0      │
│ B    ┆ cell 3  ┆ cell 4  ┆ 0      │
│ A    ┆ cell 5  ┆ cell 6  ┆ 1      │
│ B    ┆ cell 7  ┆ cell 8  ┆ 1      │
│ A    ┆ cell 9  ┆ cell 10 ┆ 2      │
│ B    ┆ cell 11 ┆ cell 12 ┆ 2      │
│ A    ┆ cell 13 ┆ cell 14 ┆ 0      │
│ B    ┆ cell 15 ┆ cell 16 ┆ 0      │
│ A    ┆ cell 17 ┆ cell 18 ┆ 1      │
│ B    ┆ cell 19 ┆ cell 20 ┆ 1      │
└──────┴─────────┴─────────┴────────┘

.map_dict()

是替换数字的一种可能方法。

df.with_columns(folder = 
   pl.col("file").cumcount().over("file").mod(3)
     .map_dict(dict(enumerate([
        "file1",
        "file2",
        "file3"
     ])))
)

shape: (10, 4)
┌──────┬─────────┬─────────┬────────┐
│ file ┆ col1    ┆ col2    ┆ folder │
│ ---  ┆ ---     ┆ ---     ┆ ---    │
│ str  ┆ str     ┆ str     ┆ str    │
╞══════╪═════════╪═════════╪════════╡
│ A    ┆ cell 1  ┆ cell 2  ┆ file1  │
│ B    ┆ cell 3  ┆ cell 4  ┆ file1  │
│ A    ┆ cell 5  ┆ cell 6  ┆ file2  │
│ B    ┆ cell 7  ┆ cell 8  ┆ file2  │
│ A    ┆ cell 9  ┆ cell 10 ┆ file3  │
│ B    ┆ cell 11 ┆ cell 12 ┆ file3  │
│ A    ┆ cell 13 ┆ cell 14 ┆ file1  │
│ B    ┆ cell 15 ┆ cell 16 ┆ file1  │
│ A    ┆ cell 17 ┆ cell 18 ┆ file2  │
│ B    ┆ cell 19 ┆ cell 20 ┆ file2  │
└──────┴─────────┴─────────┴────────┘

Answer 2

你可以

explode

:

dfg.explode(pl.exclude('file'))

总体而言，您的问题可能最好通过

join

或某种类型的

over

表达式来解决：

df = pl.DataFrame(
    {
        'file': ['A', 'B'] * 2,
        'col1': [f'cell {i}' for i in range(1, 9, 2)],
        'col2': [f'cell {i}' for i in range(2, 9, 2)],
    }
)
df2 = pl.DataFrame({'file': ['A', 'B'], 'folder': ['file1', 'file2']})

df.join(df2, on='file')

shape: (4, 4)
┌──────┬────────┬────────┬────────┐
│ file ┆ col1   ┆ col2   ┆ folder │
│ ---  ┆ ---    ┆ ---    ┆ ---    │
│ str  ┆ str    ┆ str    ┆ str    │
╞══════╪════════╪════════╪════════╡
│ A    ┆ cell 1 ┆ cell 2 ┆ file1  │
│ B    ┆ cell 3 ┆ cell 4 ┆ file2  │
│ A    ┆ cell 5 ┆ cell 6 ┆ file1  │
│ B    ┆ cell 7 ┆ cell 8 ┆ file2  │
└──────┴────────┴────────┴────────┘

如何在Python中取消极坐标数据框的分组？

问题描述投票：0回答：2

2个回答

最新问题

如何在Python中取消极坐标数据框的分组？

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2