# 如何基于列值从DataFrame中选择行？

##### 问题描述投票：1598回答：10

``````SELECT *
FROM table
WHERE colume_name = some_value
``````

python pandas dataframe
##### 10个回答
3065

``````df.loc[df['column_name'] == some_value]
``````

``````df.loc[df['column_name'].isin(some_values)]
``````

`&`组合多个条件：

``````df.loc[(df['column_name'] >= A) & (df['column_name'] <= B)]
``````

``````df['column_name'] >= A & df['column_name'] <= B
``````

``````df['column_name'] >= (A & df['column_name']) <= B
``````

``````df.loc[df['column_name'] != some_value]
``````

`isin`返回一个布尔序列，因此要选择`some_values`中值为not的行，请使用`~`取反布尔序列：

``````df.loc[~df['column_name'].isin(some_values)]
``````

``````import pandas as pd
import numpy as np
df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
'B': 'one one two three two two one three'.split(),
'C': np.arange(8), 'D': np.arange(8) * 2})
print(df)
#      A      B  C   D
# 0  foo    one  0   0
# 1  bar    one  1   2
# 2  foo    two  2   4
# 3  bar  three  3   6
# 4  foo    two  4   8
# 5  bar    two  5  10
# 6  foo    one  6  12
# 7  foo  three  7  14

print(df.loc[df['A'] == 'foo'])
``````

``````     A      B  C   D
0  foo    one  0   0
2  foo    two  2   4
4  foo    two  4   8
6  foo    one  6  12
7  foo  three  7  14
``````

``````print(df.loc[df['B'].isin(['one','three'])])
``````

``````     A      B  C   D
0  foo    one  0   0
1  bar    one  1   2
3  bar  three  3   6
6  foo    one  6  12
7  foo  three  7  14
``````

``````df = df.set_index(['B'])
print(df.loc['one'])
``````

``````       A  C   D
B
one  foo  0   0
one  bar  1   2
one  foo  6  12
``````

``````df.loc[df.index.isin(['one','two'])]
``````

``````       A  C   D
B
one  foo  0   0
one  bar  1   2
two  foo  2   4
two  foo  4   8
two  bar  5  10
one  foo  6  12
``````

2

``````Original dataframe:
A      B
0  foo    one
1  bar    one
2  foo    two
3  bar  three
4  foo    two
5  bar    two
6  foo    one
7  foo  three
Sub dataframe where B is two:
A    B
0  foo  two
1  foo  two
2  bar  two
``````

248

### tl; dr

``````select * from table where column_name = some_value
```是```
``````table[table.column_name == some_value]
```多个条件：```
``````table[(table.column_name == some_value) | (table.column_name2 == some_value2)]
```或```
``````table.query('column_name == some_value | column_name2 == some_value2')
```代码示例```
``````import pandas as pd

# Create data set
d = {'foo':[100, 111, 222],
'bar':[333, 444, 555]}
df = pd.DataFrame(d)

# Full dataframe:
df

# Shows:
#    bar   foo
# 0  333   100
# 1  444   111
# 2  555   222

# Output only the row(s) in df where foo is 222:
df[df.foo == 222]

# Shows:
#    bar  foo
# 2  555  222
```在上面的代码中，`df[df.foo == 222]`行基于列值给出行，在这种情况下为`222`。```

``````df[(df.foo == 222) | (df.bar == 444)]
#    bar  foo
# 1  444  111
# 2  555  222
```但是在那一点上，我建议使用query函数，因为它不那么冗长，并且会产生相同的结果：```
``````df.query('foo == 222 | bar == 444')
``````

211

52

19

16

``````In [68]: %timeit df.iloc[np.where(df.A.values=='foo')]  # fastest
1000 loops, best of 3: 380 µs per loop

In [69]: %timeit df.loc[df['A'] == 'foo']
1000 loops, best of 3: 745 µs per loop

In [71]: %timeit df.loc[df['A'].isin(['foo'])]
1000 loops, best of 3: 562 µs per loop

In [72]: %timeit df[df.A=='foo']
1000 loops, best of 3: 796 µs per loop

In [74]: %timeit df.query('(A=="foo")')  # slowest
1000 loops, best of 3: 1.71 ms per loop
``````

13

``````from pandas import DataFrame

# Create data set
d = {'Revenue':[100,111,222],
'Cost':[333,444,555]}
df = DataFrame(d)

# mask = Return True when the value in column "Revenue" is equal to 111
mask = df['Revenue'] == 111

# Result:
# 0    False
# 1     True
# 2    False
# Name: Revenue, dtype: bool

# Select * FROM df WHERE Revenue = 111

# Result:
#    Cost    Revenue
# 1  444     111
``````

10

### ```df.query['column_name' == 'some_value'][[col_name1, col_name2]] ```和`.query`结合使用可带来更大的灵活性：

[2019年8月更新的答案

9

``df.groupby('column_name').get_group('column_desired_value').reset_index()``