Pandas - 查找特殊列包含文本一部分的所有行

问题描述 投票:0回答:1

我有问题,查找 DataFrame 的行,其中 2 列包含 String 的一部分。 列值(字符串类型(对象)) 我的意思是与 str.contains 或 isin() 相反,因为子字符串掩码是列值。 该字符串不适合清晰分割,因为 3 个值“Cityname”、“Districtname”和“Streetname”可以包含空格。

你能帮我吗?

s = "Bad Testcity Teststr." df_res = df.loc[(s.find(df['CITY'] != -1) & (s.find(df['DISTRICT'] != -1) & (s.find(df['STREET'] != -1)]

此示例应返回 TRUE。

<bound method DataFrame.info of             ZIP              CITY               STREET NUMBER   NUMBER_SFX         DISTRICT   ONKZ ASB      ADSL      VDSL   VDSL_SV         VPSZ OUTDOOR
ID
4025217   12345  Bad Testcity          Teststr.          6              NaN  Bad Testcity  12345   2  +017.696  +102.784       NaN  49/12345/30       O
4025219   12345  Bad Testcity          Teststr.          7              NaN  Bad Testcity  12345   2  +017.696  +102.784       NaN  49/12345/30       O
4025242   12345  Bad Testcity          Teststr.          8              NaN  Bad Testcity  12345   2  +017.696  +102.784  +185.824  49/12345/30       O
4025244   12345  Bad Testcity          Teststr.         10              NaN  Bad Testcity  12345   2  +017.696  +102.784       NaN  49/12345/30       O
4025245   12345  Bad Testcity          Teststr.         11              NaN  Bad Testcity  12345   2  +017.696  +051.392       NaN  49/12345/30       O
...         ...              ...                   ...        ...              ...              ...    ...  ..       ...       ...       ...          ...     ...

[1569530 rows x 13 columns]>
python pandas substring
1个回答
0
投票

假设这个输入:

           ZIP          CITY    STREET  NUMBER  NUMBER_SFX      DISTRICT   ONKZ  ASB    ADSL     VDSL  VDSL_SV         VPSZ OUTDOOR
ID                                                                                                                                 
4025217  12345  Bad Testcity  Teststr.       6         NaN  Bad Testcity  12345    2  17.696  102.784      NaN  49/12345/30       O
4025219  12345  Bad Testcity  Teststr.       7         NaN  Bad Testcity  12345    2  17.696  102.784      NaN  49/12345/30       O
4025242  12345  Bad Testcity  Teststr.       8         NaN  Bad Testcity  12345    2  17.696  102.784  185.824  49/12345/30       O
4025244  12345  Bad Testcity  Teststr.      10         NaN  Bad Testcity  12345    2  17.696  102.784      NaN  49/12345/30       O
4025245  12345  Bad Testcity  Teststr.      11         NaN  Bad Testcity  12345    2  17.696   51.392      NaN  49/12345/30       O

您可以用空格连接列,然后在输出上使用

str.contains

s = "Bad Testcity Teststr."

df_res = df.loc[(df['CITY']+' '+df['DISTRICT']+' '+df['STREET']).str.contains(s)]

输出(此处不变):

           ZIP          CITY    STREET  NUMBER  NUMBER_SFX      DISTRICT   ONKZ  ASB    ADSL     VDSL  VDSL_SV         VPSZ OUTDOOR
ID                                                                                                                                 
4025217  12345  Bad Testcity  Teststr.       6         NaN  Bad Testcity  12345    2  17.696  102.784      NaN  49/12345/30       O
4025219  12345  Bad Testcity  Teststr.       7         NaN  Bad Testcity  12345    2  17.696  102.784      NaN  49/12345/30       O
4025242  12345  Bad Testcity  Teststr.       8         NaN  Bad Testcity  12345    2  17.696  102.784  185.824  49/12345/30       O
4025244  12345  Bad Testcity  Teststr.      10         NaN  Bad Testcity  12345    2  17.696  102.784      NaN  49/12345/30       O
4025245  12345  Bad Testcity  Teststr.      11         NaN  Bad Testcity  12345    2  17.696   51.392      NaN  49/12345/30       O
© www.soinside.com 2019 - 2024. All rights reserved.