我正在尝试使用Apache Drill。我是整个环境的新手,只是想了解Apache Drill的工作原理。
我试图使用Apache Drill查询存储在s3上的json数据。我的水桶是在美国东部(弗吉尼亚州北部)创建的。 我使用this链接为S3创建了一个新的Storage Plugin。
以下是我的新S3 Storage Plugin的配置:
{
"type": "file",
"enabled": true,
"connection": "s3a://testing-drill/",
"config": {
"fs.s3a.access.key": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"fs.s3a.secret.key": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
},
"workspaces": {
"root": {
"location": "/",
"writable": false,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
},
"tmp": {
"location": "/tmp",
"writable": true,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
}
},
"formats": {
"psv": {
"type": "text",
"extensions": [
"tbl"
],
"delimiter": "|"
},
"csv": {
"type": "text",
"extensions": [
"csv"
],
"delimiter": ","
},
"tsv": {
"type": "text",
"extensions": [
"tsv"
],
"delimiter": "\t"
},
"parquet": {
"type": "parquet"
},
"json": {
"type": "json",
"extensions": [
"json"
]
},
"avro": {
"type": "avro"
},
"sequencefile": {
"type": "sequencefile",
"extensions": [
"seq"
]
},
"csvh": {
"type": "text",
"extensions": [
"csvh"
],
"extractHeader": true,
"delimiter": ","
}
}
}
我还配置了我的core-site-example.xml
如下:
<configuration>
<property>
<name>fs.s3a.access.key</name>
<value>xxxxxxxxxxxxxxxxxxxx</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>xxxxxxxxxxxxxxxxxxxxxxxx</value>
</property>
<property>
<name>fs.s3a.endpoint</name>
<value>s3.us-east-1.amazonaws.com</value>
</property>
</configuration>
但是当我尝试使用以下命令使用/设置工作区时:
USE shiv.`root`;
它给了我以下错误:
Error: VALIDATION ERROR: Schema [shiv.root] is not valid with respect to either root schema or current default schema.
Current default schema: No default schema selected
[Error Id: 6d9515c0-b90f-48aa-9dc5-0c660f1c06ca on ip-10-0-3-241.ec2.internal:31010] (state=,code=0)
如果尝试执行show schemas;
,那么我收到以下错误:
show schemas;
Error: SYSTEM ERROR: AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: EEB438A6A0A5E667, AWS Error Code: null, AWS Error Message: Bad Request
Fragment 0:0
[Error Id: 85883537-9b4f-4057-9c90-cdaedec116a8 on ip-10-0-3-241.ec2.internal:31010] (state=,code=0)
我无法理解这个问题的根本原因。
使用Apache Drill with GCS(Google云端存储)时遇到类似问题
运行USE gcs.data
查询时出现以下错误。
VALIDATION ERROR: Schema [gcs.data] is not valid with respect to either root schema or current default schema.
Current default schema: No default schema selected
我跑SHOW SCHEMAS
,没有gcs.data
架构。
我继续在我的GCS桶中创建data
文件夹,gcs.data
出现在SHOW SCHEMAS
和USE gcs.data
查询工作。
根据我对apache drill的有限经验我所理解的是,在文件存储中,如果你有一个工作区使用不存在的文件夹,那么drill会抛出这个错误。
GCS和S3都是文件类型存储,所以可能你遇到了这个问题。
这是我的GCS存储配置
{
"type": "file",
"connection": "gs://my-gcs-bkt",
"config": null,
"workspaces": {
"data": {
"location": "/data",
"writable": true,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
},
"tmp": {
"location": "/tmp",
"writable": true,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
},
"root": {
"location": "/",
"writable": false,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
}
},
"formats": {
"parquet": {
"type": "parquet"
},
"json": {
"type": "json",
"extensions": [
"json"
]
},
"tsv": {
"type": "text",
"extensions": [
"tsv"
],
"delimiter": "\t"
},
"csvh": {
"type": "text",
"extensions": [
"csvh"
],
"extractHeader": true,
"delimiter": ","
},
"csv": {
"type": "text",
"extensions": [
"csv"
],
"delimiter": ","
},
"psv": {
"type": "text",
"extensions": [
"tbl"
],
"delimiter": "|"
}
},
"enabled": true
}