我有一个数据集,其中某些列包含数据或NaN:
rows_dict = {'category': {305: 'Seasonings, Condiments, Toppings & Sauces',
536: 'Seasonings, Condiments, Toppings & Sauces',
627: 'Commercial Snacks'},
'histamine_level': {305: pd.np.nan, 536: pd.np.nan,
627: pd.np.nan},
'food_name': {305: 'Peppermint', 536: 'Peppermint',
627: 'Peppermint flavored candy'},
'oxalate_level': {305: 'Low', 536: pd.np.nan, 627: pd.np.nan},
'salicylate_level': {305: pd.np.nan, 536: 'Very High',
627: 'High'}}
pd.DataFrame(rows_dict)
<< img src =“ https://image.soinside.com/eyJ1cmwiOiAiaHR0cHM6Ly9pLmltZ3VyLmNvbS9mdFRBMk1KLnBuZyJ9” alt =“ PreDataFrame”>因此,我试图“合并”显示此特征的行。为此,我编写了一个试图利用OR属性的函数:
def merge_2_rows(df, left_index, right_index):
row_dict = {}
columns_list = df.columns
for column_name in columns_list:
row_dict[column_name] = df.loc[left_index,
column_name] or df.loc[right_index, column_name]
match_series = (df.index.isin([left_index, right_index]))
df = df[~match_series]
df = df.append(pd.DataFrame([row_dict], columns=columns_list), ignore_index=True)
return df
但是当我运行merge_2_rows(df=a_copy_of_the_above_df, left_index=305, right_index=536)
时,我得到了:
如果第一个索引包含NaN,则OR语句将退出并且不检查第二个索引。所以这行不通。我看过pd.merge
,可能有一个Series
函数可以做到这一点,但我找不到它。如何合并两行交替的NaN的内容而不添加额外的列?
基于来自@Yuca的提示,merge_2_rows
中的此更改更简单并且可以实际使用: