如何使用熊猫清除单个单元格中的多个数据

Question

我有一个数据集，其中一列包含多个数据，在string中。This is the dataset

Genres列我认为我的模式需要多个数据。

有什么方法可以清除该列，仅保留一个值

Answer 1

Genres列中的字符串是标记列表。为了能够使用该数据，我建议将它们转化为因素，即为每个标签创建一个单独的列，以指示该标签适用于哪些行。您可以这样做：

import pandas as pd

# small subset of your data for demonstration
df = pd.DataFrame({'Name': ['Sudoku', 'Reversi', 'Morocco'], 
                   'Genres': ['Games, Strategy, Puzzle', 
                              'Games, Strategy, Board', 
                              'Games, Board, Strategy']})
display(df)

    Name        Genres
0   Sudoku      Games, Strategy, Puzzle
1   Reversi     Games, Strategy, Board
2   Morocco     Games, Board, Strategy

# split each of the strings into a list
df['Genres'] = df['Genres'].str.split(pat=',')

# collect all unique tags from those lists
tags = set(df['Genres'].explode().values)

# create a new Boolean column for each tag
for tag in tags:
    df[tag] = [tag in df['Genres'].loc[i] for i in df.index]

display(df)

    Name     Genres                     Board   Games   Puzzle  Strategy
0   Sudoku   [Games, Strategy, Puzzle]  False   True    True    True
1   Reversi  [Games, Strategy, Board]   True    True    False   True
2   Morocco  [Games, Board, Strategy]   True    True    False   True

请注意，此代码并未针对速度进行优化。我只是想展示如何做到。

Answer 2

您可以做一个

df['Genre'].str.split(",", n=1, expand=True)

根据您自己的选择输入n的值，它将分成那么多“，”然后选择所需的列

如何使用熊猫清除单个单元格中的多个数据

问题描述投票：1回答：2

2个回答

最新问题

如何使用熊猫清除单个单元格中的多个数据

问题描述 投票：1回答：2

2个回答

最新问题

问题描述投票：1回答：2