我知道循环遍历df中的行是很不好的,但是我有一列包含几百行的列表,我需要在其中修改列表中的每个元素。我很难使用.str.replace()/。strip()来处理所有额外的空间等。这是输入:
import pandas as pd
input_19 = [{'name':'Hector', 'team_position':'forward', 'player_traits':Finesse Shot, Speed Dribbler (CPU AI Only)}, {'name':'Bysim', 'team_position':'forward', 'player_traits':Long Shot Taker (CPU AI Only)}, {'name':'Nicolas', 'team_position':'defender', 'player_traits':Beat Offside Trap, Finesse Shot}]
input_19 = [{'name':'Hector', 'team_position':'forward', 'player_traits':'Finesse Shot, Speed Dribbler (CPU AI Only)'}, {'name':'Bysim', 'team_position':'forward', 'player_traits':'Long Shot Taker (CPU AI Only)'}, {'name':'Nicolas', 'team_position':'defender', 'player_traits':'Beat Offside Trap, Finesse Shot'}]
input_20 = [{'name':'Johann', 'team_position':'gk', 'player_traits':'GK Long Throw'}, {'name':'Winston', 'team_position':'defender', 'player_traits':'Dives Into Tackles (CPU AI Only)'}, {'name':'Petr', 'team_position':'forward', 'player_traits':'Flair, Long Shot Taker (CPU AI Only)'}]
df_19 = pd.DataFrame(input_19)
df_20 = pd.DataFrame(input_20)
输出:
df_19:
name player_traits team_position
0 Hector Finesse Shot, Speed Dribbler (CPU AI Only) forward
1 Bysim Long Shot Taker (CPU AI Only) forward
2 Nicolas Beat Offside Trap , Finesse Shot defender
df_20:
name player_traits team_position
0 Johann GK Long Throw gk
1 Winston Dives Into Tackles (CPU AI Only) defender
2 Petr Flair, Long Shot Taker (CPU AI Only) forward
如上所述,两个df中的'player_traits'列都需要进行字符串修改,因此我可以计算它们的出现频率。我想在原始df中进行修改(按年份),因此我可以通过使用'team_position'进行过滤来创建新的df,并使用Counter查找每个特征/元素的总数。这是我的代码,但是我不确定如何将新的'temp_list'分配到原始df中的适当位置,因为.loc与.replace()组合会修改数据帧的一部分,而.replace()对于dfs,仅接受字符串参数:
df_list = [df_19, df_20]
for df in df_list:
for lst,i in zip(df['player_traits'].values, range(len(df['player_traits'].values))):
temp_list = []
if type(lst) != float:
lst = lst.replace('(CPU AI Only)',"")
lst = lst.split(",")
for x in lst:
x = x.strip()
temp_list.append(x)
# df[location of original value in original df] = temp_list
# something like:
# df[i, 'player_traits'] = temp_list
如何完成此代码,使我可以使用修改后的列表修改原始df值?
df['player_traits'] = df['player_traits'].apply(my_function)
import pandas as pd
def my_function(lst):
temp_list = []
if type(lst) != float:
lst = lst.replace('(CPU AI Only)',"")
lst = lst.split(",")
for x in lst:
x = x.strip()
temp_list.append(x)
return temp_list
input_19 = [{'name':'Hector', 'team_position':'forward', 'player_traits':'Finesse Shot, Speed Dribbler (CPU AI Only)'}, {'name':'Bysim', 'team_position':'forward', 'player_traits':'Long Shot Taker (CPU AI Only)'}, {'name':'Nicolas', 'team_position':'defender', 'player_traits':'Beat Offside Trap, Finesse Shot'}]
input_20 = [{'name':'Johann', 'team_position':'gk', 'player_traits':'GK Long Throw'}, {'name':'Winston', 'team_position':'defender', 'player_traits':'Dives Into Tackles (CPU AI Only)'}, {'name':'Petr', 'team_position':'forward', 'player_traits':'Flair, Long Shot Taker (CPU AI Only)'}]
df_19 = pd.DataFrame(input_19)
df_20 = pd.DataFrame(input_20)
df_list = [df_19, df_20]
for df in df_list:
df['player_traits'] = df['player_traits'].apply(my_function)
print(df_19)
print(df_20)