从rowname中选择使用不区分大小写的数据帧行（如`grep -i`）

Question

我有一个如下所示的数据框：

In [1]: mydict = {"1421293_at Hdgfl1":[2.140412,1.143337,3.260313],
                  "1429877_at Lrriq3":[9.019368,0.874524,2.051820]}

In [3]: import pandas as pd

In [4]:  df = pd.DataFrame.from_dict(mydict, orient='index')

In [5]: df
Out[5]:
                          0         1         2
1421293_at Hdgfl1  2.140412  1.143337  3.260313
1429877_at Lrriq3  9.019368  0.874524  2.051820

我想要做的是使用不区分大小写的查询从行名称中选择行。例如，给定查询“hdgfl1”它应该返回：

                                         0                1               2
1421293_at Hdgfl1                 2.140412          1.143337          3.260313

“hdgfl1”是对“1421293_at Hdgfl1”的不区分大小写的查询。基本上相当于grep -i。

这样做的方法是什么？

Answer 1

In [229]: df.filter(regex=r'(?i)hdgfl1', axis=0)
Out[229]: 
                          0         1         2
1421293_at Hdgfl1  2.140412  1.143337  3.260313

Answer 2

你可以这样做：

query = 'hdgfl1'
mask = df.index.to_series().str.contains(query, case=False)
df[mask]

另一种可能性是：

mask = df.reset_index()['index'].str.contains(query, case=False)

但这慢了2倍。

Answer 3

并使用select（）：

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import re

mydict = {
"1421293_at Hdgfl1":[2.140412,1.143337,3.260313],
"1429877_at Lrriq3":[ 9.019368,0.874524,2.051820],
"1421293_at hDGFl1":[2.140412,1.143337,3.260313],
}

df = pd.DataFrame.from_dict(mydict, orient='index')

def create_match_func(a_str):
    def match_func(x):
        pattern = r".* {}".format(a_str)
        match_obj = re.search(pattern, x, flags=re.X|re.I)
        return match_obj

    return match_func

print df
print '-' * 20

target = "hdgfl1"
print df.select(create_match_func(target), axis=0)

--output:--
                          0         1         2
1421293_at Hdgfl1  2.140412  1.143337  3.260313
1429877_at Lrriq3  9.019368  0.874524  2.051820
1421293_at hDGFl1  2.140412  1.143337  3.260313
--------------------
                          0         1         2
1421293_at Hdgfl1  2.140412  1.143337  3.260313
1421293_at hDGFl1  2.140412  1.143337  3.260313

...

df.select(lambda x: x == 'A', axis=1)

select()采取function在axis的标签上运作，该功能应返回a boolean。

http://pandas.pydata.org/pandas-docs/stable/indexing.html#the-select-method

从rowname中选择使用不区分大小写的数据帧行（如`grep -i`）

问题描述投票：4回答：3

3个回答

最新问题

从rowname中选择使用不区分大小写的数据帧行（如`grep -i`）

问题描述 投票：4回答：3

3个回答

最新问题

问题描述投票：4回答：3