使用Pandas`transform`实现替代解决方案

问题描述 投票:1回答:1

我正在分析TMDB dataset on Kaggle和变量release_date中存在的年份,与变量release_year相比,某些条目移动了40年:

# Change to pandas datetime
tmdb_df['release_date'] = pd.to_datetime(tmdb_df['release_date'])

tmdb_df.query('release_date > datetime.date(2015,12,31)')[['release_date', 'release_year']].head()
###
#release_date   release_year
#9849   2062-10-04  1962
#9850   2062-12-10  1962
#9851   2062-06-13  1962
#9852   2062-12-25  1962
#9853   2062-10-24  1962

我想出了一个使用apply的解决方案:

# Check for movies where the year on `release_date` are shifted
# when compared with `release_yer`
import datetime

# Change to pandas datetime
tmdb_df['release_date'] = pd.to_datetime(tmdb_df['release_date'])

def aux_func(row):
    """Fix year"""
    if row['release_date'].year != row['release_year']:
        return row['release_date'].replace(year=row['release_year'])
    else:
         return row['release_date']

# Apply fix
tmdb_df['release_date'] = tmdb_df[['release_date', 'release_year']].apply(aux_func, axis=1)

但我想知道是否有可能使用熊猫的transform来解决这个问题,或者是否有另一种方法。

python pandas dataframe
1个回答
1
投票

如果想要同年,那么首先加入没有year的日期:

df = pd.DataFrame({'release_date':['2062-10-04','1980-12-10'],'release_year':[1962,1980]})
print (df)
  release_date  release_year
0   2062-10-04          1962
1   1980-12-10          1980

df['release_date'] = pd.to_datetime(df['release_year'].astype(str) + 
                                    df['release_date'].str[4:])

print (df)

  release_date  release_year
0   1962-10-04          1962
1   1980-12-10          1980
© www.soinside.com 2019 - 2024. All rights reserved.