如何使用熊猫清除单个单元格中的多个数据

问题描述 投票:1回答:2

我有一个数据集,其中一列包含多个数据,在string中。This is the dataset

Genres列我认为我的模式需要多个数据。

有什么方法可以清除该列,仅保留一个值

pandas dataframe data-analysis
2个回答
0
投票

Genres列中的字符串是标记列表。为了能够使用该数据,我建议将它们转化为因素,即为每个标签创建一个单独的列,以指示该标签适用于哪些行。您可以这样做:

import pandas as pd

# small subset of your data for demonstration
df = pd.DataFrame({'Name': ['Sudoku', 'Reversi', 'Morocco'], 
                   'Genres': ['Games, Strategy, Puzzle', 
                              'Games, Strategy, Board', 
                              'Games, Board, Strategy']})
display(df)
    Name        Genres
0   Sudoku      Games, Strategy, Puzzle
1   Reversi     Games, Strategy, Board
2   Morocco     Games, Board, Strategy
# split each of the strings into a list
df['Genres'] = df['Genres'].str.split(pat=',')

# collect all unique tags from those lists
tags = set(df['Genres'].explode().values)

# create a new Boolean column for each tag
for tag in tags:
    df[tag] = [tag in df['Genres'].loc[i] for i in df.index]

display(df)
    Name     Genres                     Board   Games   Puzzle  Strategy
0   Sudoku   [Games, Strategy, Puzzle]  False   True    True    True
1   Reversi  [Games, Strategy, Board]   True    True    False   True
2   Morocco  [Games, Board, Strategy]   True    True    False   True

请注意,此代码并未针对速度进行优化。我只是想展示如何做到。


0
投票

您可以做一个

df['Genre'].str.split(",", n=1, expand=True)

根据您自己的选择输入n的值,它将分成那么多“,”然后选择所需的列

© www.soinside.com 2019 - 2024. All rights reserved.