如何在 DuckDB 中创建一个对 pyarrow 对象进行操作的简单 UDF?
对于 MRE,这是通过 pyarrow 重新实现 sqrt 函数的尝试。 DuckDB 仅给出下面列出的神秘错误。
import pyarrow as pa
import duckdb
def test(x):
return x.compute.sqrt()
duckdb.create_function(
'test',
test,
type = 'arrow',
parameters = [ int ],
return_type = float,
)
N = 1000
duckdb.sql(f"""
create table t (a int);
insert into t (a) select * from generate_series(1, {N});
alter table t add column b float;
update table t set b = test(a);
""")
错误
Traceback (most recent call last):
File "/Users/w/Codin/duckdb_rust_udf/test.py", line 8, in <module>
duckdb.create_function(
TypeError: create_function(): incompatible function arguments. The following argument types are supported:
1. (name: str, function: function, return_type: object = None, parameters: duckdb.duckdb.typing.DuckDBPyType = None, *, type: duckdb.duckdb.functional.PythonUDFType = <PythonUDFType.NATIVE: 0>, null_handling: duckdb.duckdb.functional.FunctionNullHandling = 0, exception_handling: duckdb.duckdb.PythonExceptionHandling = 0, side_effects: bool = False, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyConnection
Invoked with: 'test', <function test at 0x100237d90>; kwargs: type='arrow', parameters=[<class 'int'>], return_type=<class 'int'>
我认为你需要使用类型注释:
否则,您需要使用
duckdb.typing
中的类型,例如INTEGER
、FLOAT
等
import pyarrow as pa
import duckdb
def arrow_udf(x: int) -> float:
return pa.compute.sqrt(x)
duckdb.create_function(
"arrow_udf",
arrow_udf,
# [duckdb.typing.INTEGER], <- needed if not using type annotations
# duckdb.typing.FLOAT,
type = "arrow"
)
duckdb.sql(f"""
from (from generate_series(1, 10) t(A))
select arrow_udf(A)
""")
┌────────────────────┐
│ arrow_udf(A) │
│ double │
├────────────────────┤
│ 1.0 │
│ 1.4142135623730951 │
│ 1.7320508075688772 │
│ 2.0 │
│ 2.23606797749979 │
│ 2.449489742783178 │
│ 2.6457513110645907 │
│ 2.8284271247461903 │
│ 3.0 │
│ 3.1622776601683795 │
├────────────────────┤
│ 10 rows │
└────────────────────┘
您的示例适用于 main,我认为这可能是存根和实现发散的问题。
最近切换为生成以解决此类问题。 您可以安装夜间构建来测试这一点,或者等待 0.10.3 发布