我有一个pandas数据框,它有一个基于月份的数据,如下所示:
df
id Month val
g1 Jan 1
g1 Feb 5
g1 Mar 61
我想要的是以下内容:
我希望将数据帧转换为具有月份列(已替换或未更换)的周结构,以及该月可能发生的所有周,因此输出应如下所示:(因此每个月为4周)
new_df
id week val
g1 1 1
g1 2 1
g1 3 1
g1 4 1
g1 5 5
g1 6 5
g1 7 5
g1 8 5
g1 9 61
g1 10 61
g1 11 61
g1 12 61
我尝试使用以下函数并将其应用于pandas数据帧,但这不起作用:
SAMPLE CODE
def myfun(mon):
if mon == 'Jan':
wk = list(range(1,5))
elif mon == 'Feb':
wk = list(range(5,9))
else:
wk = list(range(9,13))
return wk
df['week'] = df.apply(lambda row: myfun(row['Month']), axis=1)
del df['Month']
我得到的输出如下,这不是我想要的:
id val week
g1 1 [1, 2, 3, 4]
g1 5 [5, 6, 7, 8]
g1 61 [9, 10, 11, 12]
还有一种巧妙的方法来实现这一目标吗?
非常感谢帮助。谢谢。
我们可以使用DataFrame.groupby
和Dataframe.reindex
与range(4)
。在输出上我们使用fillna
和方法forwardfill ffill
来取代NaN
。
之后我们用Month
将pandas.to_datetime
转换为datetime格式,所以我们可以按月排序。
最后,我们创建列Week
bij获取索引并添加1并删除Month
列:
# extend index with 4 weeks for each month
df_new = pd.concat([
d.reset_index(drop=True).reindex(range(4))
for n, d in df.groupby('Month')
], ignore_index=True).fillna(method='ffill')
# Make a datetetime format from month columns
df_new["Month"] = pd.to_datetime(df_new.Month, format='%b', errors='coerce').dt.month
# Now we can sort it by month
df_new.sort_values('Month', inplace=True)
# Create a Week columns
df_new['Week'] = df_new.reset_index(drop=True).index + 1
# Drop month column since we dont need it anymore
df_new.drop('Month', axis=1, inplace=True)
df_new.reset_index(drop=True, inplace=True)
产量:
print(df_new)
id val Week
0 g1 1.0 1
1 g1 1.0 2
2 g1 1.0 3
3 g1 1.0 4
4 g1 5.0 5
5 g1 5.0 6
6 g1 5.0 7
7 g1 5.0 8
8 g1 61.0 9
9 g1 61.0 10
10 g1 61.0 11
11 g1 61.0 12
试试这个:
month={'Jan':1,'Feb':2,'March':3,'April':4,'May':5,'June':6,'July':7,'August':8,'Sept':9,'Oct':10,'Nov':11,'Dec':12}
new_df = pd.DataFrame(columns=['id', 'week', 'val']) # create a new dataframe
for index,row in df.iterrows(): # for each row in df
month_num=(month[row[1]]-1)*4+1 # to get the starting week order from the dictionary "month"
for i in range(4): # iterate four times
# append (add) the row with the week value to the new data frame
new_df = new_df.append({'id':row[0],'week':month_num,'val':row[2]}, ignore_index=True)
month_num+=1 # increment the week order
print(new_df)