我有一个巨大的多索引数据框架。我希望根据多索引的内容部分创建新的列。这就是我所拥有的东西。
arrays = [['bar', 'bar', 'bar', 'baz', 'baz', 'foo', 'foo','foo','qux', 'qux'],
['one', 'two', 'three', 'one', 'four', 'one', 'two', 'eight','one', 'two'],
['green', 'green', 'blue', 'blue', 'black', 'black', 'orange', 'green','blue', 'black'] ]
s = pd.DataFrame(np.random.randn(10), index=arrays)
s.index.names = ['p1','p2','p3']
s
0
p1 p2 p3
bar one green -0.676472
two green -0.030377
three blue -0.957517
baz one blue 0.710764
four black 0.404377
foo one black -0.286358
two orange -1.620832
eight green 0.316170
qux one blue -0.433310
two black 1.127754
这就是我想要的
0 x1 x2 x3
p1 p2 p3
bar one green 1.563381 1 0 1
two green 0.193622 0 0 0
three blue 0.046728 1 0 0
baz one blue 0.098216 0 0 0
black 1.826574 0 1 0
foo one black -0.120856 1 1 1
two orange 0.605020 0 0 0
eight green 0.693606 0 0 0
qux one blue 0.588244 1 1 1
two black -0.872104 1 1 1
现在,在伪代码中,我想:
if (p1 =='bar') & (p2 == 'one') & (p3 == 'green'): s['x1'] = 1, s['x3'] = 1
if (p1 == 'bar') & (p3 == 'blue'): s['x1'] = 1
if (p1 == 'baz') & (p3 == 'black'): s['x2'] = 1
if (p1 =='foo') & (p2 == 'one') & (p3 == 'black'): s['x1'] = 1, s['x2'] = 1, s['x3'] = 1
if (p1 == 'qux'): s['x1'] = 1, s['x2'] = 1, s['x3'] = 1
根据多索引的列值,我想给新的x列赋值1。我正在寻找一种像numpy.select (condition, choice)这样的向量化方法,但我无法让numpy.select在每个条件有多个选择的情况下工作。
因为我有14个索引列,所以我希望能明确地使用我的条件列的名称(即 (p1 == 'bar') & (p2 == 'one')
是首选,而不是 ['bar','one',]
).
如果有任何指导,将非常感谢!
谢谢你的帮助!我有一个巨大的多索引数据框架。
这里是可能的使用选择 索引片 并按以下方式设置栏目 1
喜欢。
idx = pd.IndexSlice
s = s.assign(x1=0, x2=0, x3=0)
s.loc[idx['bar','one','green'], ['x1','x3']] = 1
s.loc[idx['bar',:,'blue'], ['x1']] = 1
s.loc[idx['baz',:,'black'], ['x2']] = 1
s.loc[idx['foo','one','black'], ['x1','x2','x3']] = 1
s.loc[idx['qux',:,:], ['x1','x2','x3']] = 1
print (s)
0 x1 x2 x3
p1 p2 p3
bar one green 0.152556 1 0 1
two green 0.488762 0 0 0
three blue 0.037346 1 0 0
baz one blue 1.903518 0 0 0
four black 0.589922 0 1 0
foo one black 0.871984 1 1 1
two orange 0.514062 0 0 0
eight green -0.177246 0 0 0
qux one blue 0.740046 1 1 1
two black 0.755664 1 1 1
EDIT: 延伸到@Jezrael的解决方案:结合
def get_i(lev, val):
return s.index.get_level_values(lev) == val
s = s.assign(x1=0, x2=0, x3=0)
s.loc[get_i('p1','bar') & get_i('p2','one') & get_i('p3','green'), ['x1','x3']] = 1
s.loc[get_i('p1','bar') & get_i('p3','blue'), ['x1']] = 1
s.loc[get_i('p1','baz') & get_i('p3','black'), ['x2']] = 1
s.loc[get_i('p1','foo') & get_i('p2','one') & get_i('p3','black'), ['x1','x2','x3']] = 1
s.loc[get_i('p1','qux'), ['x1','x2','x3']] = 1
print (s)
0 x1 x2 x3
p1 p2 p3
bar one green -0.029773 1 0 1
two green -1.505461 0 0 0
three blue 1.819085 1 0 0
baz one blue 0.645498 0 0 0
four black -1.119554 0 1 0
foo one black 1.002072 1 1 1
two orange -0.461030 0 0 0
eight green -2.565080 0 0 0
qux one blue 0.286967 1 1 1
two black -0.522340 1 1 1
延伸到@jezrael的解决方案:结合... 疑问 和 索引片 可以帮助索引名称的使用。
#conditions
cond1 = s.query('p1=="bar" and p2=="one" and p3=="green"').index
cond2 = s.query('p1=="bar" and p3=="blue"').index
cond3 = s.query('p1=="baz" and p3=="black"').index
cond4 = s.query('p1=="foo" and p2=="one" and p3=="black"').index
cond5 = s.query('p1=="qux"').index
idx = pd.IndexSlice
#create zero columns
s = s.assign(x1=0,x2=0,x3=0)
#assign values :
s.loc[idx[cond1], ["x1","x3"]] = 1
s.loc[idx[cond2], ["x1"]] = 1
s.loc[idx[cond3], ['x2']] = 1
s.loc[idx[cond4], ['x1', 'x2','x3']] = 1
s.loc[idx[cond5], ['x1', 'x2','x3']] = 1
0 x1 x2 x3
p1 p2 p3
bar one green 1.122544 1 0 1
two green 0.157234 0 0 0
three blue 0.760863 1 0 0
baz one blue -0.194400 0 0 0
four black 0.937159 0 1 0
foo one black -0.986325 1 1 1
two orange -0.002486 0 0 0
eight green 0.067649 0 0 0
qux one blue 1.024345 1 1 1
two black 0.884644 1 1 1