在Python中使用阈值创建一个指标变量,将NaN的值作为NaN的值。

问题描述 投票:0回答:1

我有一些来自电导率探头的浮动数据,其中包含一些NaNs。我想根据经验阈值将探头数据转换为指标变量,但我希望NaN值保持为NaNs。转换为指示器似乎很直接,但问题在于如何处理nan's。下面是一个阈值为50的例子。

import numpy as np
import pandas as pd

x = [0, np.nan, 2, 3, 4, 51, 61, 71, 81, 91]
df = pd.DataFrame({"x":x})
df['indicator'] = (df.x <=50)*1

产量:

      x  indicator
0   0.0          1
1   NaN          0
2   2.0          1
3   3.0          1
4   4.0          1
5  51.0          0
6  61.0          0
7  71.0          0
8  81.0          0
9  91.0          0

但我想让nan的指标变成nan,就像这样。

      x  indicator
0   0.0          1
1   NaN        NaN  
2   2.0          1
3   3.0          1
4   4.0          1
5  51.0          0
6  61.0          0
7  71.0          0
8  81.0          0
9  91.0          0

任何帮助都将被感激。谢谢。

python pandas nan
1个回答
2
投票

你可以试试这个。

import numpy as np
import pandas as pd

x = [0, np.nan, 2, 3, 4, 51, 61, 71, 81, 91]
df = pd.DataFrame({"x":x})
df['indicator'] = df.x*(df.x <=50)

输出:

      x  indicator
0   0.0        0.0
1   NaN        NaN
2   2.0        2.0
3   3.0        3.0
4   4.0        4.0
5  51.0        0.0
6  61.0        0.0
7  71.0        0.0
8  81.0        0.0
9  91.0        0.0

准确的输出:

mport numpy as np
import pandas as pd

x = [0, np.nan, 2, 3, 4, 51, 61, 71, 81, 91]
df = pd.DataFrame({"x":x})
df['indicator'] = np.where(df.x.isnull(), np.nan, df.x < 50)

输出:

      x  indicator
0   0.0        1.0
1   NaN        NaN
2   2.0        1.0
3   3.0        1.0
4   4.0        1.0
5  51.0        0.0
6  61.0        0.0
7  71.0        0.0
8  81.0        0.0
9  91.0        0.0

1
投票

IIUC。

In [1829]: df['indicator'] = df[df.x <=50]*1                                                                                                                                                                

该指标将只对x <=50的行进行设置。

In [1830]: df                                                                                                                                                                                               
Out[1830]: 
      x  indicator
0   0.0        0.0
1   NaN        NaN
2   2.0        2.0
3   3.0        3.0
4   4.0        4.0
5  51.0        NaN
6  61.0        NaN
7  71.0        NaN
8  81.0        NaN
9  91.0        NaN

1
投票

我想我应该尝试对一列应用lambda:)

x = [0, np.nan, 2, 3, 4, 51, 61, 71, 81, 91]
df = pd.DataFrame({"x":x})
indicator = lambda x: np.nan if (np.isnan(x)) else (x<=50)*1 
df['indicator'] = df['x'].apply(indicator)
print(df)

打印:IIUC:指标将仅在x <=50的行中设置:我想我可以尝试对一列应用lambda:)

      x  indicator
0   0.0        1.0
1   NaN        NaN
2   2.0        1.0
3   3.0        1.0
4   4.0        1.0
5  51.0        0.0
6  61.0        0.0
7  71.0        0.0
8  81.0        0.0
9  91.0        0.0
© www.soinside.com 2019 - 2024. All rights reserved.