如何让 Amazon EMR 中的 Trino 同时支持 AWS Glue 数据目录中的 Delta 表和 Postgres 表?

问题描述 投票:0回答:1

我有一些由 AWS Glue 爬网程序在 AWS Glue 数据目录中注册的 Delta 表和 Postgres (Amazon RDS) 表:

我最初创建了一个 Amazon EMR 集群:

aws emr create-cluster \
    --name hm-amazon-emr-cluster \
    --applications Name=Trino \
    --configurations file://configurations.json
    # ...

配置.json

[
  {
    "Classification": "delta-defaults",
    "Properties": {"delta.enabled": "true"}
  },
  {
    "Classification": "trino-connector-delta",
    "Properties": {"hive.metastore": "glue"}
  }
]

基于 AWS 指南

现在我可以通过 Amazon EMR 中的 Trino 在 DBeaver 中成功查询“Delta”表。

SELECT * FROM delta.my_db.engine_demo

但是,当我通过 Amazon EMR 中的 Trino 查询 DBeaver 中的“Postgres”表时

SELECT * FROM delta.my_db.myrds_public_engine_demo_metadata

我有错误

org.jkiss.dbeaver.model.sql.DBSQLException: SQL Error [133001]: Query failed (#20230816_001032_00001_ja739): my_db.myrds_public_engine_demo_metadata is not a Delta Lake table
    at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCStatementImpl.executeStatement(JDBCStatementImpl.java:133)
    at org.jkiss.dbeaver.model.impl.jdbc.struct.JDBCTable.readData(JDBCTable.java:186)
    at org.jkiss.dbeaver.ui.controls.resultset.ResultSetJobDataRead.lambda$0(ResultSetJobDataRead.java:123)
    at org.jkiss.dbeaver.model.exec.DBExecUtils.tryExecuteRecover(DBExecUtils.java:173)
    at org.jkiss.dbeaver.ui.controls.resultset.ResultSetJobDataRead.run(ResultSetJobDataRead.java:121)
    at org.jkiss.dbeaver.ui.controls.resultset.ResultSetViewer$ResultSetDataPumpJob.run(ResultSetViewer.java:5035)
    at org.jkiss.dbeaver.model.runtime.AbstractJob.run(AbstractJob.java:105)
    at org.eclipse.core.internal.jobs.Worker.run(Worker.java:63)
Caused by: java.sql.SQLException: Query failed (#20230816_001032_00001_ja739): my_db.myrds_public_engine_demo_metadata is not a Delta Lake table
    at io.trino.jdbc.AbstractTrinoResultSet.resultsException(AbstractTrinoResultSet.java:1937)
    at io.trino.jdbc.TrinoResultSet.getColumns(TrinoResultSet.java:318)
    at io.trino.jdbc.TrinoResultSet.create(TrinoResultSet.java:61)
    at io.trino.jdbc.TrinoStatement.internalExecute(TrinoStatement.java:262)
    at io.trino.jdbc.TrinoStatement.execute(TrinoStatement.java:240)
    at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCStatementImpl.execute(JDBCStatementImpl.java:329)
    at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCStatementImpl.lambda$0(JDBCStatementImpl.java:131)
    at org.jkiss.dbeaver.utils.SecurityManagerUtils.wrapDriverActions(SecurityManagerUtils.java:96)
    at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCStatementImpl.executeStatement(JDBCStatementImpl.java:131)
    ... 7 more
Caused by: io.trino.plugin.deltalake.metastore.NotADeltaLakeTableException: my_db.myrds_public_engine_demo_metadata is not a Delta Lake table
    at io.trino.plugin.deltalake.metastore.HiveMetastoreBackedDeltaLakeMetastore.verifyDeltaLakeTable(HiveMetastoreBackedDeltaLakeMetastore.java:135)
    at java.base/java.util.Optional.ifPresent(Optional.java:178)
    at io.trino.plugin.deltalake.metastore.HiveMetastoreBackedDeltaLakeMetastore.getTable(HiveMetastoreBackedDeltaLakeMetastore.java:124)
    at io.trino.plugin.deltalake.DeltaLakeMetadata.getTableHandle(DeltaLakeMetadata.java:428)
    at io.trino.spi.connector.ConnectorMetadata.getTableHandle(ConnectorMetadata.java:123)
    at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorMetadata.getTableHandle(ClassLoaderSafeConnectorMetadata.java:1086)
    at io.trino.tracing.TracingConnectorMetadata.getTableHandle(TracingConnectorMetadata.java:143)
    at io.trino.metadata.MetadataManager.lambda$getTableHandle$5(MetadataManager.java:283)
    at java.base/java.util.Optional.flatMap(Optional.java:289)
    at io.trino.metadata.MetadataManager.getTableHandle(MetadataManager.java:277)
    at io.trino.metadata.MetadataManager.getRedirectionAwareTableHandle(MetadataManager.java:1561)
    at io.trino.metadata.MetadataManager.getRedirectionAwareTableHandle(MetadataManager.java:1553)
    at io.trino.tracing.TracingMetadata.getRedirectionAwareTableHandle(TracingMetadata.java:1265)
    at io.trino.sql.analyzer.StatementAnalyzer$Visitor.getTableHandle(StatementAnalyzer.java:5389)
    at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitTable(StatementAnalyzer.java:2167)
    at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitTable(StatementAnalyzer.java:482)
    at io.trino.sql.tree.Table.accept(Table.java:60)
    at io.trino.sql.tree.AstVisitor.process(AstVisitor.java:27)
    at io.trino.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:499)
    at io.trino.sql.analyzer.StatementAnalyzer$Visitor.analyzeFrom(StatementAnalyzer.java:4448)
    at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitQuerySpecification(StatementAnalyzer.java:2931)
    at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitQuerySpecification(StatementAnalyzer.java:482)
    at io.trino.sql.tree.QuerySpecification.accept(QuerySpecification.java:155)
    at io.trino.sql.tree.AstVisitor.process(AstVisitor.java:27)
    at io.trino.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:499)
    at io.trino.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:507)
    at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitQuery(StatementAnalyzer.java:1458)
    at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitQuery(StatementAnalyzer.java:482)
    at io.trino.sql.tree.Query.accept(Query.java:107)
    at io.trino.sql.tree.AstVisitor.process(AstVisitor.java:27)
    at io.trino.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:499)
    at io.trino.sql.analyzer.StatementAnalyzer.analyze(StatementAnalyzer.java:461)
    at io.trino.sql.analyzer.Analyzer.analyze(Analyzer.java:97)
    at io.trino.sql.analyzer.Analyzer.analyze(Analyzer.java:86)
    at io.trino.execution.SqlQueryExecution.analyze(SqlQueryExecution.java:271)
    at io.trino.execution.SqlQueryExecution.<init>(SqlQueryExecution.java:206)
    at io.trino.execution.SqlQueryExecution$SqlQueryExecutionFactory.createQueryExecution(SqlQueryExecution.java:845)
    at io.trino.dispatcher.LocalDispatchQueryFactory.lambda$createDispatchQuery$0(LocalDispatchQueryFactory.java:154)
    at io.trino.$gen.Trino_414____20230807_181353_2.call(Unknown Source)
    at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
    at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74)
    at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)

我感觉它与我创建 Amazon EMR 集群时提供的

configurations
有关,因为我只有 Delta 支持,没有 Postgres:

[
  {
    "Classification": "delta-defaults",
    "Properties": {"delta.enabled": "true"}
  },
  {
    "Classification": "trino-connector-delta",
    "Properties": {"hive.metastore": "glue"}
  }
]

但是,我不确定什么是正确的

configurations
我应该提供支持Delta表和Postgres表。

这是我期待的经典 Trino 配置

connector.name=delta-lake
hive.metastore.uri=thrift://hive-metastore:9083 # In my case, it would be `glue`
hive.s3.endpoint=https://s3.us-west-2.amazonaws.com
hive.s3.aws-access-key=abc
hive.s3.aws-secret-key=xxx

connector.name=postgresql
connection-url=jdbc:postgresql://my_psql:5432/mydb
connection-user=abc
connection-password=xxx

但是,我不确定如何转换为创建 Amazon EMR - Trino 集群时使用的

configurations

任何指南将不胜感激,谢谢!

postgresql amazon-web-services aws-glue amazon-emr trino
1个回答
0
投票

我认为问题是 Trino 期望每个目录只有一种类型。

即使我成功使用 AWS Glue 爬网程序在 AWS Glue 数据目录中注册 Postgres (Amazon RDS) 表,因为我使用

  {
    "Classification": "trino-connector-delta",
    "Properties": {
      "hive.metastore": "glue"
    }
  }

Trino 期望 AWS Glue 数据目录中的所有表都是 Delta 表。

所以我改变了方向,通过提供

,我在“目录”级别成功注册了我的 Amazon RDS (Postgres)
aws emr create-cluster \
    --name=hm-amazon-emr-cluster \
    --applications=Name=Trino \
    --configurations=file://configurations.json

配置.json:

[
  {
    "Classification": "delta-defaults",
    "Properties": {
      "delta.enabled": "true"
    }
  },
  {
    "Classification": "trino-connector-delta",
    "Properties": {
      "hive.metastore": "glue"
    }
  },
  {
    "Classification": "trino-connector-postgresql",
    "Properties": {
      "connection-url": "jdbc:postgresql://xxx.xxx.us-west-2.rds.amazonaws.com/my_db",
      "connection-user": "abc",
      "connection-password": "xxx"
    }
  }
]

现在我可以通过 Amazon EMR 中的 Trino 读取所有 Postgres (Amazon RDS) 表。

© www.soinside.com 2019 - 2024. All rights reserved.