Pandas检查哪个子字符串在字符串列中

问题描述 投票:0回答:5

我试图创建一个函数,它将在pandas数据框中创建一个新列,它会在字符串列中找出哪个子字符串并获取子字符串并将其用于新列。

问题是要查找的文本不会出现在变量x中的相同位置

 df = pd.DataFrame({'x': ["var_m500_0_somevartext","var_m500_0_vartextagain",
 "varwithsomeothertext_0_500", "varwithsomext_m150_0_text"], 'x1': [4, 5, 6,8]})

finds = ["m500_0","0_500","m150_0"]

哪个finds在给定的df["x"]

我已经制作了一个有效的功能,但是对于大型数据集来说非常慢

def pd_create_substring_var(df,new_var_name = "new_var",substring_list=["1"],var_ori="x"):
    import re
    df[new_var_name] = "na"
    cols =  list(df.columns)
    for ix in range(len(df)):
        for find in substring_list:
            for m in re.finditer(find, df.iloc[ix][var_ori]):
                df.iat[ix, cols.index(new_var_name)] = df.iloc[ix][var_ori][m.start():m.end()]
    return df


df = pd_create_substring_var(df,"t",finds,var_ori="x")

df 
                            x  x1       t
0      var_m500_0_somevartext   4  m500_0
1     var_m500_0_vartextagain   5  m500_0
2  varwithsomeothertext_0_500   6   0_500
3   varwithsomext_m150_0_text   8  m150_0
python pandas dataframe
5个回答
1
投票

可能不是最好的方法:

df['t'] = df['x'].apply(lambda x: ''.join([i for i in finds if i in x]))

现在:

print(df)

方法是:

                            x  x1       t
0      var_m500_0_somevartext   4  m500_0
1     var_m500_0_vartextagain   5  m500_0
2  varwithsomeothertext_0_500   6   0_500
3   varwithsomext_m150_0_text   8  m150_0

现在,只需添加@ pythonjokeun的答案,您就可以:

df["t"] = df["x"].str.extract("(%s)" % '|'.join(finds))

要么:

df["t"] = df["x"].str.extract("({})".format('|'.join(finds)))

要么:

df["t"] = df["x"].str.extract("(" + '|'.join(finds) + ")")

3
投票

这是否能满足您的需求?

finds = ["m500_0", "0_500", "m150_0"]
df["t"] = df["x"].str.extract(f"({'|'.join(finds)})")

1
投票

我不知道您的数据集有多大,但您可以使用如下的地图功能:

def subset_df_test():
  df = pandas.DataFrame({'x': ["var_m500_0_somevartext", "var_m500_0_vartextagain",
                         "varwithsomeothertext_0_500", "varwithsomext_m150_0_text"], 'x1': [4, 5, 6, 8]})

  finds = ["m500_0", "0_500", "m150_0"]
  df['t'] = df['x'].map(lambda x: compare(x, finds))
  print df

def compare(x, finds):
  for f in finds:
    if f in x:
        return f

1
投票

使用pandas.str.findall

df['x'].str.findall("|".join(finds))

0    [m500_0]
1    [m500_0]
2     [0_500]
3    [m150_0]

0
投票

试试这个

df["t"] = df["x"].apply(lambda x: [i for i in finds if i in x][0])
© www.soinside.com 2019 - 2024. All rights reserved.