我有以下分类数据:
['Self employed', 'Government Dependent',
'Formally employed Private', 'Informally employed',
'Formally employed Government', 'Farming and Fishing',
'Remittance Dependent', 'Other Income',
'Don't Know/Refuse to answer', 'No Income']
我如何将它们放入例如这样的垃圾箱中:
['Government Dependent','Formally employed Government','Formally
employed Private'] = 0
['Remittance Dependent', 'Informally employed','Self employed','Other Income'] = 1
['Dont Know/Refuse to answer', 'No Income','Farming and Fishing'] = 2
我已经知道将数值数据放入分类箱...可以相反吗?
TRAIN = pd.read_csv("Train_v2.csv")
TRAIN['job_type'].unique()
output:
array(['Self employed', 'Government Dependent',
'Formally employed Private', 'Informally employed',
'Formally employed Government', 'Farming and Fishing',
'Remittance Dependent', 'Other Income',
'Dont Know/Refuse to answer', 'No Income'], dtype=object)
首先创建字典,通过交换进行更改,最后使用Series.map
:
如果不属于类别0或1或2,则可以执行numpy.select
并将m1 = TRAIN['job_type'].isin(['Government Dependent','Formally employed Government','Formally employed Private'])
m2 = TRAIN['job_type'].isin(['Remittance Dependent', 'Informally employed'])
m3 = TRAIN['job_type'].isin(["Don't Know/Refuse to answer", 'No Income'])
TRAIN['new'] = np.select([m1, m2, m3], [0, 1, 2], np.nan)
设为值。np.where
np.nan
上的更多资源: