我有一些体育数据的熊猫数据库。列是姓名,年龄,出生城市,出生国家,新秀,体重和问题。原始数据的出生城市为美国玩家的“城市,州”,因此当我使用逗号分隔符时,结果是两个变量。所以现在所有的美国球员都被转移了,我需要制作一个“问题”变量来解释多余的问题。
如何通过成千上万的观察将美国人转移到左边?谢谢!
我有什么(请原谅表格格式):
Name Age BirthCity BirthCountry Rookie Weight Problem
Frank 32 Seattle WA USA N 200
Jake 24 Geneva Switzerland Y 210
期望:
Name Age BirthCity BirthCountry Rookie Weight
Frank 32 Seattle USA N 200
Jake 24 Geneva Switzerland Y 210
一种方法是首先有选择地删除第3列(记住Python首先计数0列),同时添加额外的列NaN
。然后删除最终的Problem
系列。
# df, start with this dataframe
#
# Name Age BirthCity BirthCountry Rookie Weight Problem
# 0 Frank 32 Seattle WA USA N 200.0
# 1 Jake 24 Geneva Switzerland Y 210 NaN
def shifter(row):
return np.hstack((np.delete(np.array(row), [3]), [np.nan]))
mask = df['Rookie'] == 'USA'
df.loc[mask, :] = df.loc[mask, :].apply(shifter, axis=1)
df = df.drop(['Problem'], axis=1)
# Name Age BirthCity BirthCountry Rookie Weight
# 0 Frank 32 Seattle USA N 200
# 1 Jake 24 Geneva Switzerland Y 210
不那么容易:
#get all rows by mask
mask = df['Rookie'] == 'USA'
c = ['BirthCountry','Rookie','Weight','Problem']
#shift columns, but necessary converting to strings
df.loc[mask, c] = df.loc[mask, c].astype(str).shift(-1, axis=1)
#converting column Weight to float and then int
df['Weight'] = df['Weight'].astype(float).astype(int)
#remove column Problem
df = df.drop('Problem', axis=1)
print (df)
Name Age BirthCity BirthCountry Rookie Weight
0 Frank 32 Seattle USA N 200
1 Jake 24 Geneva Switzerland Y 210