晚上好,我在填充数据框中缺失的数据时遇到了问题。
如果缺失值的数量在 1 到 6 之间,我想用函数 interpolate 填充每个公司的 ESG 缺失数据,如果缺失值超过 6 个,则删除该列。
代码运行,但它没有填充 Nan 值。
import pandas as pd
#import the file
excel_file_path = 'insert your path'
sp500_esg = pd.read_excel(excel_file_path)
#rename date column for esg
wrongindex2 = sp500_esg.columns[0]
sp500_esg = sp500_esg.rename(columns = {wrongindex2:"Date"})
#set date column as index
sp500_esg = sp500_esg.set_index('Date', drop = True)
#make a copy
try1 = sp500_esg.copy()
#resample from daily to yearly
try1 = try1.resample('Y').mean()
#count total NaN in each column
print(try1.isnull().sum())
def interpolate_func(try1):
for column in try1.columns:
# Count the number of missing values in the column
missing_count = try1[column].isna().sum()
# If the number of missing values is between 1 and 6, interpolate to fill missing values
if 1 <= missing_count <= 6:
try1[column] = try1[column].interpolate()
# If the number of missing values is greater than 6, drop the column
elif missing_count > 6:
try1 = try1.drop(column, axis=1)
return try1
有什么想法吗?
谢谢!
我尝试指定插值函数的参数,没有结果
我看不到原始数据,但我没有看到您的代码有任何问题。这是一个小代码,可以满足您的需要:
import pandas as pd
df = pd.DataFrame({'a': [1,2,None, None]*4, 'b':[3,4,None,5]*4, 'c':[6,7,8,9]*4})
print(df)
for column in df.columns:
n = df[column].isna().sum()
if n > 6:
df.drop(columns=column, inplace=True)
elif 1 < n < 6:
df[column] = df[column].interpolate()
print(df)
a b c
0 1.0 3.0 6
1 2.0 4.0 7
2 NaN NaN 8
3 NaN 5.0 9
4 1.0 3.0 6
5 2.0 4.0 7
6 NaN NaN 8
7 NaN 5.0 9
8 1.0 3.0 6
9 2.0 4.0 7
10 NaN NaN 8
11 NaN 5.0 9
12 1.0 3.0 6
13 2.0 4.0 7
14 NaN NaN 8
15 NaN 5.0 9
b c
0 3.0 6
1 4.0 7
2 4.5 8
3 5.0 9
4 3.0 6
5 4.0 7
6 4.5 8
7 5.0 9
8 3.0 6
9 4.0 7
10 4.5 8
11 5.0 9
12 3.0 6
13 4.0 7
14 4.5 8
15 5.0 9