我尝试创建 dask_cudf 数据框,但出现错误。
import dask_cudf
import cudf
# Example pandas DataFrame with a datetime string column
pdf = pd.DataFrame({'datetime_str': ['2024-03-19 12:00:00', '2024-03-19 10:00:00', '2024-03-19 11:00:00']})
# Convert the pandas DataFrame to a cuDF DataFrame
cdf = cudf.from_pandas(pdf)
# Convert the cuDF DataFrame to a Dask cuDF DataFrame
ddf = dask_cudf.from_cudf(cdf, npartitions=2) # error
我收到错误:
AttributeError: DataFrame object has no attribute map_partitions
我发现了
cudf.core.dataframe.DataFrame # no map_partitions
dask_cudf.DataFrame.map_partitions
dask_cudf.core.map_partitions
dask_cudf.core.DataFrame.map_partitions
dask.dataframe.map_partitions
如何让“dask_cudf.from_cudf”访问map_partitions?谢谢
根据文档:
是dask_cudf.from_cudf
周围的薄包装纸dask.dataframe.from_pandas()
第一个参数
data
预计为 pandas.DataFrame
或 pandas.Series
。
所以阅读应该这样完成:
pdf = pd.DataFrame({
'datetime_str': [
'2024-03-19 12:00:00',
'2024-03-19 10:00:00',
'2024-03-19 11:00:00']})
# Convert the cuDF DataFrame to a Dask cuDF DataFrame
ddf = dask_cudf.from_cudf(pdf, npartitions=2)