如何在pandas中按所述值拆分列和组中的多个值?

问题描述 投票:1回答:1

我正在尝试通过分离具有多个值的列来创建新的DataFrame,以便每行只有一个值。

我尝试了一些groupby操作,但我似乎无法将值分开或由用户组织

 item    title   feature
0   1   ToyStory(1995) Adventure|Animation|Children|Comedy|Fantasy
1   2   Jumanji (1995)  Adventure|Children|Fantasy
2   3   Grumpier Old Men (1995) Comedy|Romance
3   4   Waiting to Exhale (1995)    Comedy|Drama|Romance
4   5   Father of the Bride Part II (1995)  Comedy
item    feature
0   1   Adventure
1   1   Animation
2   1   Children
3   1   Comedy
4   1   Fantasy
python pandas dataframe transformation
1个回答
1
投票

你需要str.split,然后是stack

r = df.set_index('item').feature.str.split('|', expand=True).stack()
r.index = r.index.get_level_values(0)

r.reset_index(name='feature')

    item    feature
0      1  Adventure
1      1  Animation
2      1   Children
3      1     Comedy
4      1    Fantasy
5      2  Adventure
6      2   Children
7      2    Fantasy
8      3     Comedy
9      3    Romance
10     4     Comedy
11     4      Drama
12     4    Romance
13     5     Comedy

另一种选择是使用np.repeat

u = df.set_index('item').feature.str.split('|')
pd.DataFrame({
    'item': np.repeat(u.index, u.str.len()), 
    'feature': [y for x in u for y in x]
})

    item    feature
0      1  Adventure
1      1  Animation
2      1   Children
3      1     Comedy
4      1    Fantasy
5      2  Adventure
6      2   Children
7      2    Fantasy
8      3     Comedy
9      3    Romance
10     4     Comedy
11     4      Drama
12     4    Romance
13     5     Comedy
© www.soinside.com 2019 - 2024. All rights reserved.