我有一些由 AWS Glue 爬网程序在 AWS Glue 数据目录中注册的 Delta 表和 Postgres (Amazon RDS) 表:
我最初创建了一个 Amazon EMR 集群:
aws emr create-cluster \
--name hm-amazon-emr-cluster \
--applications Name=Trino \
--configurations file://configurations.json
# ...
配置.json
[
{
"Classification": "delta-defaults",
"Properties": {"delta.enabled": "true"}
},
{
"Classification": "trino-connector-delta",
"Properties": {"hive.metastore": "glue"}
}
]
基于 AWS 指南。
现在我可以通过 Amazon EMR 中的 Trino 在 DBeaver 中成功查询“Delta”表。
SELECT * FROM delta.my_db.engine_demo
但是,当我通过 Amazon EMR 中的 Trino 查询 DBeaver 中的“Postgres”表时
SELECT * FROM delta.my_db.myrds_public_engine_demo_metadata
我有错误
org.jkiss.dbeaver.model.sql.DBSQLException: SQL Error [133001]: Query failed (#20230816_001032_00001_ja739): my_db.myrds_public_engine_demo_metadata is not a Delta Lake table
at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCStatementImpl.executeStatement(JDBCStatementImpl.java:133)
at org.jkiss.dbeaver.model.impl.jdbc.struct.JDBCTable.readData(JDBCTable.java:186)
at org.jkiss.dbeaver.ui.controls.resultset.ResultSetJobDataRead.lambda$0(ResultSetJobDataRead.java:123)
at org.jkiss.dbeaver.model.exec.DBExecUtils.tryExecuteRecover(DBExecUtils.java:173)
at org.jkiss.dbeaver.ui.controls.resultset.ResultSetJobDataRead.run(ResultSetJobDataRead.java:121)
at org.jkiss.dbeaver.ui.controls.resultset.ResultSetViewer$ResultSetDataPumpJob.run(ResultSetViewer.java:5035)
at org.jkiss.dbeaver.model.runtime.AbstractJob.run(AbstractJob.java:105)
at org.eclipse.core.internal.jobs.Worker.run(Worker.java:63)
Caused by: java.sql.SQLException: Query failed (#20230816_001032_00001_ja739): my_db.myrds_public_engine_demo_metadata is not a Delta Lake table
at io.trino.jdbc.AbstractTrinoResultSet.resultsException(AbstractTrinoResultSet.java:1937)
at io.trino.jdbc.TrinoResultSet.getColumns(TrinoResultSet.java:318)
at io.trino.jdbc.TrinoResultSet.create(TrinoResultSet.java:61)
at io.trino.jdbc.TrinoStatement.internalExecute(TrinoStatement.java:262)
at io.trino.jdbc.TrinoStatement.execute(TrinoStatement.java:240)
at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCStatementImpl.execute(JDBCStatementImpl.java:329)
at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCStatementImpl.lambda$0(JDBCStatementImpl.java:131)
at org.jkiss.dbeaver.utils.SecurityManagerUtils.wrapDriverActions(SecurityManagerUtils.java:96)
at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCStatementImpl.executeStatement(JDBCStatementImpl.java:131)
... 7 more
Caused by: io.trino.plugin.deltalake.metastore.NotADeltaLakeTableException: my_db.myrds_public_engine_demo_metadata is not a Delta Lake table
at io.trino.plugin.deltalake.metastore.HiveMetastoreBackedDeltaLakeMetastore.verifyDeltaLakeTable(HiveMetastoreBackedDeltaLakeMetastore.java:135)
at java.base/java.util.Optional.ifPresent(Optional.java:178)
at io.trino.plugin.deltalake.metastore.HiveMetastoreBackedDeltaLakeMetastore.getTable(HiveMetastoreBackedDeltaLakeMetastore.java:124)
at io.trino.plugin.deltalake.DeltaLakeMetadata.getTableHandle(DeltaLakeMetadata.java:428)
at io.trino.spi.connector.ConnectorMetadata.getTableHandle(ConnectorMetadata.java:123)
at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorMetadata.getTableHandle(ClassLoaderSafeConnectorMetadata.java:1086)
at io.trino.tracing.TracingConnectorMetadata.getTableHandle(TracingConnectorMetadata.java:143)
at io.trino.metadata.MetadataManager.lambda$getTableHandle$5(MetadataManager.java:283)
at java.base/java.util.Optional.flatMap(Optional.java:289)
at io.trino.metadata.MetadataManager.getTableHandle(MetadataManager.java:277)
at io.trino.metadata.MetadataManager.getRedirectionAwareTableHandle(MetadataManager.java:1561)
at io.trino.metadata.MetadataManager.getRedirectionAwareTableHandle(MetadataManager.java:1553)
at io.trino.tracing.TracingMetadata.getRedirectionAwareTableHandle(TracingMetadata.java:1265)
at io.trino.sql.analyzer.StatementAnalyzer$Visitor.getTableHandle(StatementAnalyzer.java:5389)
at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitTable(StatementAnalyzer.java:2167)
at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitTable(StatementAnalyzer.java:482)
at io.trino.sql.tree.Table.accept(Table.java:60)
at io.trino.sql.tree.AstVisitor.process(AstVisitor.java:27)
at io.trino.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:499)
at io.trino.sql.analyzer.StatementAnalyzer$Visitor.analyzeFrom(StatementAnalyzer.java:4448)
at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitQuerySpecification(StatementAnalyzer.java:2931)
at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitQuerySpecification(StatementAnalyzer.java:482)
at io.trino.sql.tree.QuerySpecification.accept(QuerySpecification.java:155)
at io.trino.sql.tree.AstVisitor.process(AstVisitor.java:27)
at io.trino.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:499)
at io.trino.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:507)
at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitQuery(StatementAnalyzer.java:1458)
at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitQuery(StatementAnalyzer.java:482)
at io.trino.sql.tree.Query.accept(Query.java:107)
at io.trino.sql.tree.AstVisitor.process(AstVisitor.java:27)
at io.trino.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:499)
at io.trino.sql.analyzer.StatementAnalyzer.analyze(StatementAnalyzer.java:461)
at io.trino.sql.analyzer.Analyzer.analyze(Analyzer.java:97)
at io.trino.sql.analyzer.Analyzer.analyze(Analyzer.java:86)
at io.trino.execution.SqlQueryExecution.analyze(SqlQueryExecution.java:271)
at io.trino.execution.SqlQueryExecution.<init>(SqlQueryExecution.java:206)
at io.trino.execution.SqlQueryExecution$SqlQueryExecutionFactory.createQueryExecution(SqlQueryExecution.java:845)
at io.trino.dispatcher.LocalDispatchQueryFactory.lambda$createDispatchQuery$0(LocalDispatchQueryFactory.java:154)
at io.trino.$gen.Trino_414____20230807_181353_2.call(Unknown Source)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
我感觉它与我创建 Amazon EMR 集群时提供的
configurations
有关,因为我只有 Delta 支持,没有 Postgres:
[
{
"Classification": "delta-defaults",
"Properties": {"delta.enabled": "true"}
},
{
"Classification": "trino-connector-delta",
"Properties": {"hive.metastore": "glue"}
}
]
但是,我不确定什么是正确的
configurations
我应该提供支持Delta表和Postgres表。
这是我期待的经典 Trino 配置
connector.name=delta-lake
hive.metastore.uri=thrift://hive-metastore:9083 # In my case, it would be `glue`
hive.s3.endpoint=https://s3.us-west-2.amazonaws.com
hive.s3.aws-access-key=abc
hive.s3.aws-secret-key=xxx
connector.name=postgresql
connection-url=jdbc:postgresql://my_psql:5432/mydb
connection-user=abc
connection-password=xxx
但是,我不确定如何转换为创建 Amazon EMR - Trino 集群时使用的
configurations
。
任何指南将不胜感激,谢谢!
我认为问题是 Trino 期望每个目录只有一种类型。
即使我成功使用 AWS Glue 爬网程序在 AWS Glue 数据目录中注册 Postgres (Amazon RDS) 表,因为我使用
{
"Classification": "trino-connector-delta",
"Properties": {
"hive.metastore": "glue"
}
}
Trino 期望 AWS Glue 数据目录中的所有表都是 Delta 表。
所以我改变了方向,通过提供
,我在“目录”级别成功注册了我的 Amazon RDS (Postgres)aws emr create-cluster \
--name=hm-amazon-emr-cluster \
--applications=Name=Trino \
--configurations=file://configurations.json
配置.json:
[
{
"Classification": "delta-defaults",
"Properties": {
"delta.enabled": "true"
}
},
{
"Classification": "trino-connector-delta",
"Properties": {
"hive.metastore": "glue"
}
},
{
"Classification": "trino-connector-postgresql",
"Properties": {
"connection-url": "jdbc:postgresql://xxx.xxx.us-west-2.rds.amazonaws.com/my_db",
"connection-user": "abc",
"connection-password": "xxx"
}
}
]
现在我可以通过 Amazon EMR 中的 Trino 读取所有 Postgres (Amazon RDS) 表。