我需要根据3个条件提取行。
列 col1
应该包含list_words中的所有单词。
第一行应该以单词 Story
接下来的行应该以 ac
在这个问题的帮助下,我成功地让它工作起来了 基于条件提取行的Pandas Python 但问题是,我需要提取每条以 Story
和之后的行,行结束于 ac
.这是我目前的代码。
import pandas as pd
df = pd.DataFrame({'col1': ['Draft SW Quality Assurance Plan Story', 'alex ac', 'anny ac', 'antoine ac','aze epic', 'bella ac', 'Complete SW Quality Assurance Plan Story', 'celine ac','wqas epic', 'karmen ac', 'kameilia ac', 'Update SW Quality Assurance Plan Story', 'joseph ac','Update SW Quality Assurance Plan ac', 'joseph ac'],
'col2': ['aa', 'bb', 'cc', 'dd','ee', 'ff', 'gg', 'hh', 'ii', 'jj', 'kk', 'll', 'mm', 'nn', 'oo']})
print(df)
list_words="SW Quality Plan Story"
set_words = set(list_words.split())
df["Suffix"] = df.col1.apply(lambda x: x.split()[-1])
# Condition 1: all words in col1 minus all words in set_words must be empty
df["condition_1"] = df.col1.apply(lambda x: not bool(set_words - set(x.split())))
# Condition 2: the last word should be 'Story'
df["condition_2"] = df.col1.str.endswith("Story")
# Condition 3: the last word in the next row should be ac. See `shift(-1)`
df["condition_3"] = df.col1.str.endswith("ac").shift(-1)
# Condition 3: the last word in the next row should be ac. See `shift(-1)`
df["condition_4"] = df.col1.str.endswith("ac")
# When all three conditions meet: new column 'conditions'
df["conditions"] = df.condition_1 & df.condition_2 & df.condition_3
df["conditions&"] = df.conditions | df.conditions.shift(1)
print(df[['condition_1', 'condition_2','condition_3' ,'condition_4']])
df.to_excel('cond.xlsx', 'Sheet1', index=True)
df["TrueFalse"] = df.conditions | df.conditions.shift(1)
df1=df[["col1", "col2", "TrueFalse", "Suffix"]][df.TrueFalse]
print(df1)
这是我的输出:
0 Draft SW Quality Assurance Plan Story aa True Story
1 alex ac bb True ac
6 Complete SW Quality Assurance Plan Story gg True Story
7 celine ac hh True ac
11 Update SW Quality Assurance Plan Story ll True Story
12 joseph ac mm True ac
这是我想要的输出。
0 Draft SW Quality Assurance Plan Story aa True Story
1 alex ac bb True ac
2 anny ac cc True ac
3 antoine ac dd True ac
6 Complete SW Quality Assurance Plan Story gg True Story
7 celine ac hh True ac
11 Update SW Quality Assurance Plan Story ll True Story
12 joseph ac mm True ac
13 Update SW Quality Assurance Plan ac nn True ac
14 joseph ac oo True ac
我需要提取所有以... ac
在以 Story
( 包括第二行和第三行),不只是第一行。这样做可行吗?
也许你可以通过创建一个满足这两个条件的列来实现。endswith
故事和所有的话。创建另一列,即 endswith
吖。使用 groupby
在...上 cumsum
的第一列,然后做 any
在两列'gr'和'ac'和 cummin
这意味着每一个组,一旦满足False条件,即使行以ac结尾,该组的其他行也会变成False。groupby将为你想保留的行创建一个带True的掩码,所以使用 loc
用这个面具。
df['gr'] = (df['col1'].str.endswith('Story')
&df['col1'].apply(lambda x: not bool(set_words - set(x.split()))))
df['ac'] = df['col1'].str.endswith('ac')
df_f = df.loc[df.groupby(df['gr'].cumsum())
.apply(lambda x: np.any(x[['gr', 'ac']], axis=1).cummin())
.to_numpy(), ['col1', 'col2']]
print (df_f)
col1 col2
0 Draft SW Quality Assurance Plan Story aa
1 alex ac bb
2 anny ac cc
3 antoine ac dd
6 Complete SW Quality Assurance Plan Story gg
7 celine ac hh
11 Update SW Quality Assurance Plan Story ll
12 joseph ac mm
13 Update SW Quality Assurance Plan ac nn
14 joseph ac oo