如何在Python DataFrame中显示变量名称而不是列名称?

问题描述 投票:0回答:2

我目前正在 Colab 中使用 Python 学习数据分析的基础知识,为此我使用我的 IMDb 监视列表作为数据集。

流派列中,可以在同一个单元格中注册多个电影流派(这使事情变得更加困难),我试图计算此数据集中呈现的流派的比例,然后用饼图或也许是条形图。

dataset

因此,我创建了变量来将每种流派的

value_counts()
存储为
True
False
,如下所示:

action = df['Genres'].str.contains('Action').value_counts()
animation = df['Genres'].str.contains('Animation').value_counts()
biography = df['Genres'].str.contains('Biography').value_counts()
comedy = df['Genres'].str.contains('Comedy').value_counts()
crime = df['Genres'].str.contains('Crime').value_counts()
drama = df['Genres'].str.contains('Drama').value_counts()
documentary = df['Genres'].str.contains('Documentary').value_counts()
family = df['Genres'].str.contains('Family').value_counts()
fantasy = df['Genres'].str.contains('Fantasy').value_counts()
film_noir = df['Genres'].str.contains('Film-Noir').value_counts()
history = df['Genres'].str.contains('History').value_counts()
horror = df['Genres'].str.contains('Horror').value_counts()
mystery = df['Genres'].str.contains('Mystery').value_counts()
music = df['Genres'].str.contains('Music').value_counts()
musical = df['Genres'].str.contains('Musical').value_counts()
romance = df['Genres'].str.contains('Romance').value_counts()
scifi = df['Genres'].str.contains('Sci-Fi').value_counts()
sport = df['Genres'].str.contains('Sport').value_counts()
thriller = df['Genres'].str.contains('Thriller').value_counts()
war = df['Genres'].str.contains('War').value_counts()
western = df['Genres'].str.contains('Western').value_counts()

然后我将这些变量放入

DataFrame

genres = pd.DataFrame(
    [action, animation, biography,
     comedy, crime, drama,
     documentary, family, fantasy,
     film_noir, history, horror,
     mystery, music, musical,
     romance, scifi, sport,
     thriller, war, western],
    )
genres.head(5)

问题出在输出中:

output

我希望它显示变量名称而不是“流派”,因为它显示在第一列中。可以吗?

python pandas dataframe dataset imdb
2个回答
2
投票

我认为您可以通过使用字典创建一个

DataFrame
来实现此目的,其中键是流派名称,值是包含计数的相应系列。这是一个例子:

import pandas as pd

# Sample DataFrame
data = {'Genres': ['Action, Drama', 'Comedy, Romance', 'Action, Comedy', 'Drama', 'Comedy']}
df = pd.DataFrame(data)

# List of genres
genre_list = ['Action', 'Animation', 'Biography', 'Comedy', 'Crime', 'Drama', 'Documentary', 'Family',
              'Fantasy', 'Film-Noir', 'History', 'Horror', 'Mystery', 'Music', 'Musical', 'Romance',
              'Sci-Fi', 'Sport', 'Thriller', 'War', 'Western']

# Create a dictionary to store genre counts
genre_counts = {}

# Populate the dictionary with counts
for genre in genre_list:
    genre_counts[genre] = df['Genres'].str.contains(genre).sum()

# Create a DataFrame from the dictionary
genres_df = pd.DataFrame(list(genre_counts.items()), columns=['Genre', 'Count'])

# Display the DataFrame
print(genres_df)

此代码创建一个字典

(genre_counts)
,其中键是流派名称,值是“流派”列中每种流派的计数。然后,它将字典转换为
DataFrame (genres_df)
并显示它。这样,DataFrame 将具有“流派”和“计数”列,而不是“流派”。


0
投票

避免使用相对较慢的

for
循环的更快方法:

假设有以下数据框

                       Genres
0              Comedy, Horror
1          Comedy, Drama, War
2  Mistery, Romance, Thriller

建议的代码

import pandas as pd

# create the original DataFrame
df = pd.DataFrame({'Genres': ['Comedy, Horror', 'Comedy, Drama, War', 'Mistery, Romance, Thriller']})

# split the genres by comma and explode the list into separate rows
df = df.assign(Genres=df['Genres'].str.split(',')).explode('Genres')

# create an empty dictionary to store the genre counts
all_genres = df['Genres'].unique()

# Counting Matrix using crosstab method
genre_counts = pd.crosstab(index=df.index, columns=df['Genres'], margins=False).to_dict('index')

genre_counts = pd.DataFrame(genre_counts)

# count the number of 0s and 1s in each row
counts = ( genre_counts.apply(lambda row: [sum(row == 0), sum(row == 1)], axis=1) )

# Final count with 2 columns 'False' and 'True'
counts = pd.DataFrame(counts.tolist(), index=counts.index).rename(columns={0:'False', 1:'True'})

print(counts)

可视化

           False  True
 Drama         2     1
 Horror        2     1
 Romance       2     1
 Thriller      2     1
 War           2     1
Comedy         1     2
Mistery        2     1
© www.soinside.com 2019 - 2024. All rights reserved.