我有数据帧,即
Input Dataframe
class section sub marks school city
0 I A Eng 80 jghss salem
1 I A Mat 90 jghss salem
2 I A Eng 50 Nan salem
3 III A Eng 80 gphss Nan
4 III A Mat 45 Nan salem
5 III A Eng 40 gphss Nan
6 III A Eng 20 gphss salem
7 III A Mat 55 gphss Nan
当“class”和“section”列中的值匹配时,我需要替换“school”和“city”中的“Nan”。结果假设是输入数据帧
class section sub marks school city
0 I A Eng 80 jghss salem
1 I A Mat 90 jghss salem
2 I A Eng 50 jghss salem
3 III A Eng 80 gphss salem
4 III A Mat 45 gphss salem
5 III A Eng 40 gphss salem
6 III A Eng 20 gphss salem
7 III A Mat 55 gphss salem
任何人都可以帮助我吗?
使用lambda function
列表中指定的列中的DataFrame.groupby
每组使用向前和向后填充缺失值 - 对于每个组合,每组需要相同的值:
cols = ['school','city']
df[cols] = df.groupby(['class','section'])[cols].apply(lambda x: x.ffill().bfill())
print (df)
class section sub marks school city
0 I A Eng 80 jghss salem
1 I A Mat 90 jghss salem
2 I A Eng 50 jghss salem
3 III A Eng 80 gphss salem
4 III A Mat 45 gphss salem
5 III A Eng 40 gphss salem
6 III A Eng 20 gphss salem
7 III A Mat 55 gphss salem
假设每对class
和section
对应一对独特的school
和city
,我们可以使用groupby
:
# create a dictionary of class and section with school and city
# here we assume that for each pair and class there's a row with both school and city
# if that's not the case, we can separate the two series
school_city_dict = df[['class', 'section','school','city']].dropna().\
groupby(['class', 'section'])[['school','city']].\
max().to_dict()
# school_city_dict = {'school': {('I', 'A'): 'jghss', ('III', 'A'): 'gphss'},
# 'city': {('I', 'A'): 'salem', ('III', 'A'): 'salem'}}
# set index, prepare for map function
df.set_index(['class','section'], inplace=True)
df.loc[:,'school'] = df.index.map(school_city_dict['school'])
df.loc[:,'city'] = df.index.map(school_city_dict['city'])
# reset index to the original
df.reset_index()