基于现有列值的新列

问题描述 投票:1回答:1

尝试根据现有列“Temp”值创建新值ChillFrozen

Temp包含以下值:

-18.00C
-20.00C
+10.00C
+19.00C
Nan
DRY

如何使用Pandas实现它

df = pd.DataFrame({'Temp': ['-18.00C', '+10.00c', 'NaN', 'DRY']})

如果Temp <0.0C,它将落在Frozen之下 如果Temp> 0.0C,它将落在Chill之下 如果Temp是“Nan”或“DRY”,它将属于NA

预期成绩:

Temp_Category
Frozen
Chill
NA
NA
python pandas if-statement condition nan
1个回答
0
投票

您可以通过字典提取第一个值和Series.map,但在数字之前总是必要的值+-

df = pd.DataFrame({'Temp': ['-18.00C', '+10.00c', 'NaN', 'DRY', '+0c', '20c']})

d = {'-':'Frozen', '+':'Chill'}
df['new1'] = df['Temp'].str[0].map(d)

另一个想法是提取数值,转换为float并使用numpy.sign,但如果有0输出也是0,所以输出是NaN

pat = r"([-+]?\d*\.\d+|\d+)"
d1 = {1:'Chill', -1:'Frozen', 0:'Chill'}
df['new2'] = np.sign(df['Temp'].str.extract(pat, expand=False).astype(float)).map(d1)

使用2条件和numpy.select的解决方案:

pat = r"([-+]?\d*\.\d+|\d+)"
s = df['Temp'].str.extract(pat).astype(float)
df['new3'] = np.select([s >= 0, s < 0], ['Chill','Frozen'], default=np.nan)

如果只有最后一个温度值是非数字的(例如cC)可能使用to_numeric通过索引删除最后一个字符:

s = pd.to_numeric(df['Temp'].str[:-1], errors='coerce')
df['new4'] = np.select([s >= 0, s < 0], ['Chill','Frozen'], default=np.nan)
print (df)
      Temp    new1    new2    new3    new4
0  -18.00C  Frozen  Frozen  Frozen  Frozen
1  +10.00c   Chill   Chill   Chill   Chill
2      NaN     NaN     NaN     nan     nan
3      DRY     NaN     NaN     nan     nan
4      +0c   Chill   Chill   Chill   Chill
5      20c     NaN   Chill   Chill   Chill
© www.soinside.com 2019 - 2024. All rights reserved.