.groupby().mean() 函数在数据分析中不适用于我

问题描述 投票:0回答:1

我开始使用Python进行数据分析项目,我有一个员工流失的数据集,所以我有一个带有两个值0和1的分类列名称(Attrition),数据集的其余部分包括int,对象数据类型。我的问题是,当我尝试使用 (.groupby) 函数按此分类列对数据集进行分组时,我无法编写 (.mean() ) 函数 正如您在捕获中看到的那样

enter image description here

# import libraries:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import preprocessing
from sklearn.preprocessing import StandardScaler

# read the dataset 
df= pd.read_csv('C:/Users/LENOVO/Desktop/internship/HR-Employee.csv')
df.head()

#EDA Exploration Data Analysis
df.shape # 1470 Raws with 35 columns (factors)
nullValues = df.isnull().sum().sum()#EDA : is to identify the pattterns through different data visualization
nullValues #No null values in this dataset
duplicatedValues= df.duplicated().sum()
duplicatedValues# No duplcated values in this dataset

df = df.replace(to_replace = ['Yes','No'],value = ['1','0'])
df = df.replace(to_replace = ['Travel_Rarely',
'Travel_Frequently','Non-Travel'],value = ['2','1','0'])
df = df.replace(to_replace = ['Married','Single','Divorced'],value = ['2','1','0'])
df = df.replace(to_replace = ['Male','Female'],value = ['1','0'])
#---
df = df.replace(to_replace = ['Human Resources','Research & Development','Sales'],value = ['0','1','2'])
df = df.replace(to_replace = ['Human Resources','Life Sciences','Marketing','Medical','Technical Degree','Other'],value = ['0','1','2','3','4','5'])
df = df.replace(to_replace = ['Healthcare Representative','Human Resources','Laboratory Technician','Manager','Manufacturing Director','Research Director','Research Scientist','Sales Executive','Sales Representative'],value = [0,1,2,3,4,5,6,7,8])



  
# drop unnecessery columns
DF = df.drop(['EmployeeCount','Over18','StandardHours'])
# Let's see the information of our updated dataset DF
DF.info()
''' This dataset had 1470 samples and 32 attributes,
(24 integer + 8 objects ) No variables have non null/
missing values'''

DF.describe()
left= DF.groupby('Attrition')
left.mean()
python function analytics data-analysis
1个回答
0
投票

您收到的错误 “+ 不支持的操作数类型:int 和 str” 足以理解该问题。您要应用

mean()
的字段包含 int 和 string 类型的数据。

在应用

mean()
之前,尝试将列中的数据设为相同类型。

有很多可能的方法来检查列中的不同数据类型,例如,您可以使用以下方法检查列中的不同数据类型:

df.<column_name>.apply(type).value_counts()

为了演示这一点,我从 Kaggle 获取了 Titanic 数据集,该数据集如下所示:

“性别” 只有两个唯一值,“男性”“女性”

现在,我将分类列“性别”转换为数字列:

最后,我在

“性别”
列上应用 groupby(),然后将
mean()
用作:

瞧……它成功了。

© www.soinside.com 2019 - 2024. All rights reserved.