我试图改变我的数据字符串使用地图功能做数值。
这是该数据:
label sms_message
0 ham Go until jurong point, crazy.. Available only ...
1 ham Ok lar... Joking wif u oni...
2 spam Free entry in 2 a wkly comp to win FA Cup fina...
3 ham U dun say so early hor... U c already then say...
4 ham Nah I don't think he goes to usf, he lives aro...
我想以此来改变“垃圾邮件” 1和“火腿”,以0:
df['label'] = df.label.map({'ham':0, 'spam':1})
但结果是:
label sms_message
0 NaN Go until jurong point, crazy.. Available only ...
1 NaN Ok lar... Joking wif u oni...
2 NaN Free entry in 2 a wkly comp to win FA Cup fina...
3 NaN U dun say so early hor... U c already then say...
4 NaN Nah I don't think he goes to usf, he lives aro...
不要任何人都可以找出问题?
你是正确的,我想你(1后1)执行相同的语句两次。 Python交互终端上执行以下语句阐明。
注意:如果你通过字典,地图()替换所有值从一系列
NaN
如果它不与字典的密钥匹配(我想,你也做同样的,即执行该语句的两倍)。检查pandas map(), apply()。大熊猫文档注:当arg是一个字典,在系列不在字典(作为密钥)的值被转换为NaN。
>>> import pandas as pd
>>>
>>> d = {
... "label": ["ham", "ham", "spam", "ham", "ham"],
... "sms_messsage": [
... "Go until jurong point, crazy.. Available only ...",
... "Ok lar... Joking wif u oni...",
... "Free entry in 2 a wkly comp to win FA Cup fina...",
... "U dun say so early hor... U c already then say...",
... "Nah I don't think he goes to usf, he lives aro..."
... ]
... }
>>>
>>> df = pd.DataFrame(d)
>>> df
label sms_messsage
0 ham Go until jurong point, crazy.. Available only ...
1 ham Ok lar... Joking wif u oni...
2 spam Free entry in 2 a wkly comp to win FA Cup fina...
3 ham U dun say so early hor... U c already then say...
4 ham Nah I don't think he goes to usf, he lives aro...
>>>
>>> df['label'] = df.label.map({'ham':0, 'spam':1})
>>> df
label sms_messsage
0 0 Go until jurong point, crazy.. Available only ...
1 0 Ok lar... Joking wif u oni...
2 1 Free entry in 2 a wkly comp to win FA Cup fina...
3 0 U dun say so early hor... U c already then say...
4 0 Nah I don't think he goes to usf, he lives aro...
>>>
>>> df['label'] = df.label.map({'ham':0, 'spam':1})
>>> df
label sms_messsage
0 NaN Go until jurong point, crazy.. Available only ...
1 NaN Ok lar... Joking wif u oni...
2 NaN Free entry in 2 a wkly comp to win FA Cup fina...
3 NaN U dun say so early hor... U c already then say...
4 NaN Nah I don't think he goes to usf, he lives aro...
>>>
>>> import pandas as pd
>>>
>>> d = {
... "label": ['spam', 'ham', 'ham', 'ham', 'spam'],
... "sms_message": ["M1", "M2", "M3", "M4", "M5"]
... }
>>>
>>> df = pd.DataFrame(d)
>>> df
label sms_message
0 spam M1
1 ham M2
2 ham M3
3 ham M4
4 spam M5
>>>
第一个方法 - 使用
map()
与dictionary
参数
>>> new_values = {'spam': 1, 'ham': 0}
>>>
>>> df
label sms_message
0 spam M1
1 ham M2
2 ham M3
3 ham M4
4 spam M5
>>>
>>> df.label = df.label.map(new_values)
>>> df
label sms_message
0 1 M1
1 0 M2
2 0 M3
3 0 M4
4 1 M5
>>>
第二个方法 - 使用
map()
与function
参数
>>> df.label = df.label.map(lambda v: 0 if v == 'ham' else 1)
>>> df
label sms_message
0 1 M1
1 0 M2
2 0 M3
3 0 M4
4 1 M5
>>>
第三个方法 - 使用
apply()
与function
参数
>>> df.label = df.label.apply(lambda v: 0 if v == "ham" else 1)
>>>
>>> df
label sms_message
0 1 M1
1 0 M2
2 0 M3
3 0 M4
4 1 M5
>>>
谢谢。
也许你的问题是与read_table功能。
试着这样做:
df = pd.read_table('smsspamcollection/SMSSpamCollection',
sep='\t',
header=None,
names=['label', 'sms_message'])