从多索引到多列的映射

Question

我有一个巨大的多索引数据框架。我希望根据多索引的内容部分创建新的列。这就是我所拥有的东西。

arrays = [['bar', 'bar', 'bar', 'baz', 'baz', 'foo', 'foo','foo','qux', 'qux'],
          ['one', 'two', 'three', 'one', 'four', 'one', 'two', 'eight','one', 'two'],
          ['green', 'green', 'blue', 'blue', 'black', 'black', 'orange', 'green','blue', 'black']  ]
s = pd.DataFrame(np.random.randn(10), index=arrays)
s.index.names = ['p1','p2','p3']

s
                         0
p1  p2    p3              
bar one   green  -0.676472
    two   green  -0.030377
    three blue   -0.957517
baz one   blue    0.710764
    four  black   0.404377
foo one   black  -0.286358
    two   orange -1.620832
    eight green   0.316170
qux one   blue   -0.433310
    two   black   1.127754

这就是我想要的

                         0  x1  x2  x3
p1  p2    p3                          
bar one   green   1.563381   1   0   1
    two   green   0.193622   0   0   0
    three blue    0.046728   1   0   0
baz one   blue    0.098216   0   0   0
          black   1.826574   0   1   0
foo one   black  -0.120856   1   1   1
    two   orange  0.605020   0   0   0
    eight green   0.693606   0   0   0
qux one   blue    0.588244   1   1   1
    two   black  -0.872104   1   1   1

现在，在伪代码中，我想：

if (p1 =='bar') & (p2 == 'one') & (p3 == 'green'): s['x1'] = 1, s['x3'] = 1
if (p1 == 'bar') & (p3 == 'blue'): s['x1'] = 1
if (p1 == 'baz') & (p3 == 'black'): s['x2'] = 1
if (p1 =='foo') & (p2 == 'one') & (p3 == 'black'): s['x1'] = 1, s['x2'] = 1, s['x3'] = 1
if (p1 == 'qux'): s['x1'] = 1, s['x2'] = 1, s['x3'] = 1

根据多索引的列值，我想给新的x列赋值1。我正在寻找一种像numpy.select (condition, choice)这样的向量化方法，但我无法让numpy.select在每个条件有多个选择的情况下工作。

因为我有14个索引列，所以我希望能明确地使用我的条件列的名称（即 (p1 == 'bar') & (p2 == 'one') 是首选，而不是 ['bar','one',]).

如果有任何指导，将非常感谢!

谢谢你的帮助！我有一个巨大的多索引数据框架。

Answer 1

这里是可能的使用选择索引片并按以下方式设置栏目 1 喜欢。

idx = pd.IndexSlice
s = s.assign(x1=0, x2=0, x3=0)
s.loc[idx['bar','one','green'], ['x1','x3']] = 1
s.loc[idx['bar',:,'blue'], ['x1']] = 1 
s.loc[idx['baz',:,'black'], ['x2']] = 1 
s.loc[idx['foo','one','black'], ['x1','x2','x3']] = 1
s.loc[idx['qux',:,:],  ['x1','x2','x3']] = 1

print (s)
                         0  x1  x2  x3
p1  p2    p3                          
bar one   green   0.152556   1   0   1
    two   green   0.488762   0   0   0
    three blue    0.037346   1   0   0
baz one   blue    1.903518   0   0   0
    four  black   0.589922   0   1   0
foo one   black   0.871984   1   1   1
    two   orange  0.514062   0   0   0
    eight green  -0.177246   0   0   0
qux one   blue    0.740046   1   1   1
    two   black   0.755664   1   1   1

EDIT：延伸到@Jezrael的解决方案：结合

def get_i(lev, val):
    return s.index.get_level_values(lev) == val

s = s.assign(x1=0, x2=0, x3=0)
s.loc[get_i('p1','bar') & get_i('p2','one') & get_i('p3','green'), ['x1','x3']] = 1
s.loc[get_i('p1','bar') & get_i('p3','blue'), ['x1']] = 1 
s.loc[get_i('p1','baz') & get_i('p3','black'), ['x2']] = 1 
s.loc[get_i('p1','foo') & get_i('p2','one') & get_i('p3','black'), ['x1','x2','x3']] = 1
s.loc[get_i('p1','qux'), ['x1','x2','x3']] = 1


print (s)
                         0  x1  x2  x3
p1  p2    p3                          
bar one   green  -0.029773   1   0   1
    two   green  -1.505461   0   0   0
    three blue    1.819085   1   0   0
baz one   blue    0.645498   0   0   0
    four  black  -1.119554   0   1   0
foo one   black   1.002072   1   1   1
    two   orange -0.461030   0   0   0
    eight green  -2.565080   0   0   0
qux one   blue    0.286967   1   1   1
    two   black  -0.522340   1   1   1

Answer 2

延伸到@jezrael的解决方案：结合... 疑问和索引片可以帮助索引名称的使用。

#conditions
cond1 = s.query('p1=="bar" and p2=="one" and p3=="green"').index
cond2 = s.query('p1=="bar" and p3=="blue"').index
cond3 = s.query('p1=="baz" and p3=="black"').index
cond4 = s.query('p1=="foo" and p2=="one" and p3=="black"').index
cond5 = s.query('p1=="qux"').index

idx = pd.IndexSlice

#create zero columns
s = s.assign(x1=0,x2=0,x3=0)

#assign values : 
s.loc[idx[cond1], ["x1","x3"]] = 1
s.loc[idx[cond2], ["x1"]] = 1
s.loc[idx[cond3], ['x2']] = 1
s.loc[idx[cond4], ['x1', 'x2','x3']] = 1
s.loc[idx[cond5], ['x1', 'x2','x3']] = 1

                     0      x1  x2  x3
p1  p2  p3              
bar one green   1.122544    1   0   1
    two green   0.157234    0   0   0
  three blue    0.760863    1   0   0
baz one blue    -0.194400   0   0   0
   four black   0.937159    0   1   0
foo one black   -0.986325   1   1   1
    two orange  -0.002486   0   0   0
  eight green   0.067649    0   0   0
qux one blue    1.024345    1   1   1
    two black   0.884644    1   1   1

从多索引到多列的映射

问题描述投票：0回答：1

1个回答

最新问题

从多索引到多列的映射

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1