我有一个 S3 存储桶
mybucketlogs
,它是通过另一个存储桶的 S3 日志记录功能创建的。我的根帐户是两个存储桶以及两个存储桶中所有对象的所有者。
我按照这里的指南https://repost.aws/knowledge-center/analyze-logs-athena将路径替换为存储日志的存储桶的根目录,例如
s3:/mybucketlog/
但是,当我去运行查询时,我可以创建表和外部数据库SELECT * FROM "s3_access_logs_db"."mybucket_logs" limit 10;
我收到访问被拒绝错误:
com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: -; S3 Extended Request ID: ; Proxy: null), S3 Extended Request ID: (Bucket: mybucketlogs, Key: 2024/01/02/)
This query ran against the "s3_access_logs_db" database, unless qualified by the query.
我在 root 帐户上运行它,所以我不确定为什么会出现权限错误,我可以直接从控制台访问 S3 存储桶并下载任何文件。
我尝试向 s3 存储桶添加权限,但效果不佳
{
"Version": "2012-10-17",
"Id": "S3-Console-Auto-Gen-Policy-XXXXXXXXXX",
"Statement": [
{
"Sid": "S3PolicyStmt-DO-NOT-MODIFY-XXXXXXXXXXXX",
"Effect": "Allow",
"Principal": {
"Service": "logging.s3.amazonaws.com"
},
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::mybucketlogs/*"
},
{
"Effect": "Allow",
"Principal": {
"Service": "athena.amazonaws.com"
},
"Action": "*",
"Resource": [
"arn:aws:s3:::mybucketlogs/*",
"arn:aws:s3:::mybucketlogs"
]
},
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<my root id>:root"
},
"Action": "*",
"Resource": [
"arn:aws:s3:::mybucketlogs/*",
"arn:aws:s3:::mybucketlogs"
]
}
]
}
创建表查询如下:
CREATE EXTERNAL TABLE s3_access_logs_db.mybucket_logs(
`bucketowner` STRING,
`bucket_name` STRING,
`requestdatetime` STRING,
`remoteip` STRING,
`requester` STRING,
`requestid` STRING,
`operation` STRING,
`key` STRING,
`request_uri` STRING,
`httpstatus` STRING,
`errorcode` STRING,
`bytessent` BIGINT,
`objectsize` BIGINT,
`totaltime` STRING,
`turnaroundtime` STRING,
`referrer` STRING,
`useragent` STRING,
`versionid` STRING,
`hostid` STRING,
`sigv` STRING,
`ciphersuite` STRING,
`authtype` STRING,
`endpoint` STRING,
`tlsversion` STRING,
`accesspointarn` STRING,
`aclrequired` STRING)
PARTITIONED BY (
`timestamp` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'input.regex'='([^ ]*) ([^ ]*) \\[(.*?)\\] ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\"[^\"]*\"|-) (-|[0-9]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\"[^\"]*\"|-) ([^ ]*)(?: ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*))?.*$')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://mybucketlogs/'
TBLPROPERTIES (
'projection.enabled'='true',
'projection.timestamp.format'='yyyy/MM/dd',
'projection.timestamp.interval'='1',
'projection.timestamp.interval.unit'='DAYS',
'projection.timestamp.range'='2024/01/01,NOW',
'projection.timestamp.type'='date',
'storage.location.template'='s3://mybucketlogs/${timestamp}')
针对
mybucketlogs
的 S3 策略,我尝试为我的 root 添加直接 athena 和完全权限,即使我不需要它,但这不起作用。
{
"Version": "2012-10-17",
"Id": "S3-Console-Auto-Gen-Policy-XXXXXXXXXXXXXX",
"Statement": [
{
"Sid": "S3PolicyStmt-DO-NOT-MODIFY-XXXXXXXXXXXXXXX",
"Effect": "Allow",
"Principal": {
"Service": "logging.s3.amazonaws.com"
},
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::mybucketlogs /*"
},
{
"Effect": "Allow",
"Principal": {
"Service": "athena.amazonaws.com"
},
"Action": "*",
"Resource": [
"arn:aws:s3:::mybucketlogs/*",
"arn:aws:s3:::mybucketlogs "
]
},
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::XXXXXXXXXXX:root"
},
"Action": "*",
"Resource": [
"arn:aws:s3:::mybucketlogs/*",
"arn:aws:s3:::mybucketlogs "
]
}
]
}
我记得 S3 存储桶策略需要在
Action
块上精确指定。所以,尝试添加 s3:GetObject
和 s3:ListBucket
权限。
{
"Effect": "Allow",
"Principal": {
"Service": "athena.amazonaws.com"
},
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::mybucketlogs/*",
"arn:aws:s3:::mybucketlogs"
]
}
以及授予 Glue 权限。