检查元素是否存在字符串

问题描述 投票:3回答:4

我正在寻找一种方法来检查是否可以在另一个字符串中找到一个字符串。 str.contains只采用固定的字符串模式作为参数,我更愿意在两个字符串列之间进行逐元素比较。

import pandas as pd

df = pd.DataFrame({'long': ['sometext', 'someothertext', 'evenmoretext'],
               'short': ['some', 'other', 'stuff']})


# This fails:
df['short_in_long'] = df['long'].str.contains(df['short'])

预期产出:

[True, True, False]
python string pandas
4个回答
3
投票

使用列表理解与zip

df['short_in_long'] = [b in a for a, b in zip(df['long'], df['short'])]

print (df)
            long  short  short_in_long
0       sometext   some           True
1  someothertext  other           True
2   evenmoretext  stuff          False

3
投票

这是列表理解的主要用例:

# df['short_in_long'] = [y in x for x, y in df[['long', 'short']].values.tolist()]
df['short_in_long'] = [y in x for x, y in df[['long', 'short']].values]
df

            long  short  short_in_long
0       sometext   some           True
1  someothertext  other           True
2   evenmoretext  stuff          False

列表推导通常比字符串方法更快,因为开销较小。见For loops with pandas - When should I care?


如果您的数据包含NaN,则可以调用具有错误处理功能:

def try_check(haystack, needle):
    try:
        return needle in haystack
    except TypeError:
        return False

df['short_in_long'] = [try_check(x, y) for x, y in df[['long', 'short']].values]

3
投票

检查numpy,它是行方式:-)。

np.core.char.find(df.long.values.astype(str),df.short.values.astype(str))!=-1
Out[302]: array([ True,  True, False])

1
投票

也,

df['short_in_long'] = df['long'].str.contains('|'.join(df['short'].values))

更新:我误解了这个问题。这是更正后的版本:

df['short_in_long'] = df['long'].apply(lambda x: True if x[1] in x[0] else False, axis =1)
© www.soinside.com 2019 - 2024. All rights reserved.