从函数修改pandas数据帧

问题描述 投票:0回答:1

我发现自己试图一次又一次地使用相同的操作修改几个数据帧。我想将所有修改放在一个函数中,只需使用数据框名称调用该函数并完成所有转换。

这是我现在尝试申请的代码和所有转换。当我运行它时,没有任何反应,数据帧仍然是原始的。

#create a preprocessing formula so the process can be applied to any dataset (traning and validation and competition)
def preprocessing(df):
    #inspect dataframe
    df.head()

    #check data types in dataframe
    np.unique(df.dtypes).tolist()

    #inspect shape before removing duplicates
    df.shape

    #drop duplicates
    df = df.drop_duplicates()

    #inspect shape again to see change
    df.shape

    #calculate rows that have a mean of 100 to remove them later
    mean100_rows = [i for i in range(len(df)) if df.iloc[i,0:520].values.mean() == 100 ]

    #calculate columns that have a mean of 100 to remove them later
    mean100_cols = [i for i in np.arange(0,520,1) if df.iloc[:,i].values.mean() == 100 ]

    #calculate columns labels that have a mean of 100 to remove them later
    col_labels = [df.columns[i] for i in mean100_cols]

    #delete rows with mean 100
    df.drop(index = mean100_rows, axis=0, inplace=True)

    #delete columns with mean 100
    df.drop(columns=col_labels, axis=1, inplace=True)

    #export columns that have been removed
    pd.Series(col_labels).to_csv('remove_cols.csv')

    #head
    df.head()

    #check size again
    df.shape
python pandas function dataframe
1个回答
1
投票

在Python对象中通过引用传递给函数。

执行以下行时

df = df.drop_duplicates()

您基本上分配了对函数参数的新引用,但函数外部的对象不会更改。

我建议更改函数,以便返回df对象,然后将其返回值分配给函数外部的df对象。

© www.soinside.com 2019 - 2024. All rights reserved.