我想将日期时间列中的年份提取到一个新的'yyyy'列中,并且我希望将缺失值(NaT)显示为'NaN',所以应该更改新列的datetime-dtype I猜猜,但是我被卡住了。
初始df:
Date ID 0 2016-01-01 12 1 2015-01-01 96 2 NaT 20 3 2018-01-01 73 4 2017-01-01 84 5 NaT 26 6 2013-01-01 87 7 2016-01-01 64 8 2019-01-01 11 9 2014-01-01 34
所需的df:
Date ID yyyy 0 2016-01-01 12 2016 1 2015-01-01 96 2015 2 NaT 20 NaN 3 2018-01-01 73 2018 4 2017-01-01 84 2017 5 NaT 26 NaN 6 2013-01-01 87 2013 7 2016-01-01 64 2016 8 2019-01-01 11 2019 9 2014-01-01 34 2014
代码:
import pandas as pd
import numpy as np
# example df
df = pd.DataFrame({"ID": [12,96,20,73,84,26,87,64,11,34],
"Date": ['2016-01-01', '2015-01-01', np.nan, '2018-01-01', '2017-01-01', np.nan, '2013-01-01', '2016-01-01', '2019-01-01', '2014-01-01']})
df.ID = pd.to_numeric(df.ID)
df.Date = pd.to_datetime(df.Date)
print(df)
#extraction of year from date
df['yyyy'] = pd.to_datetime(df.Date).dt.strftime('%Y')
#Try to set NaT to NaN or datetime to numeric, PROBLEM: empty cells keep 'NaT'
df.loc[(df['yyyy'].isna()), 'yyyy'] = np.nan
#(try1)
df.yyyy = df.Date.astype(float)
#(try2)
df.yyyy = pd.to_numeric(df.Date)
#(try3)
print(df)
我想将日期时间列中的年份提取到一个新的'yyyy'列中,并且我希望将缺失值(NaT)显示为'NaN',所以应该更改新列的datetime-dtype I ...
不是:
用途: