如何在colab中查找数据集的某一列中有多少个不同的数据

问题描述投票：0回答：1

我有一个大约由 400000 行和 8 列组成的数据集，我只想知道一列中有多少种不同类型的数据，我该怎么做？列中的数据是字符串的形式，我需要给它们分配数字，所以我需要找出该列中有多少个不同的单词。我不知道我应该做什么

machine-learning dataset google-colaboratory

1个回答

0
投票

让我们考虑一个例子，我们有一列包含不同类型数据的名称。

d = {'No.':[1,2,3,4,5,6,7],'Names':['Shahid',1,0.45,True,'Ben',3.5,1]}
import pandas as pd
data = pd.DataFrame(d)
ndatatype = data['Names'].apply(type)
ndatatype

现在，让我们使用带有条件语句的循环，该循环将调用每个输出来检测它是字符串、整数、布尔值还是浮点数，

a = ndatatype
strings = 0
integers = 0
floats  = 0
booleans = 0
for x in a:
if x == str:
strings +=1
elif x == int:
integers +=1
elif x == float:
floats +=1
elif x == bool:
booleans +=1
else:
print('not found')            
print(f"the total strings are {strings}") 
print(f"the total integers are {integers}") 
print(f"the total floats are {floats}") 
print(f"the total booleans are {booleans}")

最新问题

© www.soinside.com 2019 - 2024. All rights reserved.