我们如何将字符串或纳秒的 dask_cudf 列转换为日期时间对象?
to_datetime
在 pandas 和 cudf 中可用。请参阅下面的示例数据
import pandas
import cudf
# with pandas
df = pandas.DataFrame( {'city' : ['Dallas','Bogota','Chicago','Juarez'],
'timestamp' : [1664828099973725440,1664828099972763136,1664828094775313920,1664828081313273856]})
df['datetime'] = pd.to_datetime(df['timestamp'])
# with cdf
cdf = cudf.DataFrame( {'city' : ['Dallas','Bogota','Chicago','Juarez'],
'timestamp' : [1664828099973725440,1664828099972763136,1664828094775313920,1664828081313273856]})
cdf['datetime'] = cudf.to_datetime(cdf['timestamp'])
print(df)
print(cdf)
无论哪种情况,结果都是一样的:
city timestamp datetime
0 Dallas 1664828099973725440 2022-10-03 20:14:59.973725440
1 Bogota 1664828099972763136 2022-10-03 20:14:59.972763136
2 Chicago 1664828094775313920 2022-10-03 20:14:54.775313920
3 Juarez 1664828081313273856 2022-10-03 20:14:41.313273856
This recent SO question 建议使用 dask:
import dask_cudf
from dask import dataframe as dd
ddf = dask_cudf.from_cudf(cdf, npartitions=2)
dd.to_datetime(ddf['timestamp']).head()
产生错误。我正在从一个目录中的大量 csv 文件创建一个 dask_cudf。