使用AWS CDK / CloudFormation部署SageMaker多模型端点时的问题

问题描述 投票:0回答:1

我正在尝试使用Python语言通过AWS CDK自动部署SageMaker多模型端点,(我想通过直接以json / yaml格式编写CloudFormation模板也是如此),但是,当尝试部署它时, SageMaker模型创建时发生错误。

这里是使用cdk synth命令制作的CloudFormation模板的一部分:

Resources:
  smmodelexecutionrole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Statement:
          - Action: sts:AssumeRole
            Effect: Allow
            Principal:
              Service: sagemaker.amazonaws.com
        Version: "2012-10-17"
      Policies:
        - PolicyDocument:
            Statement:
              - Action: s3:GetObject
                Effect: Allow
                Resource:
                  Fn::Join:
                    - ""
                    - - "arn:"
                      - Ref: AWS::Partition
                      - :s3:::<bucket_name>/deploy_multi_model_artifact/*
            Version: "2012-10-17"
          PolicyName: policy_s3
        - PolicyDocument:
            Statement:
              - Action: ecr:*
                Effect: Allow
                Resource:
                  Fn::Join:
                    - ""
                    - - "arn:"
                      - Ref: AWS::Partition
                      - ":ecr:"
                      - Ref: AWS::Region
                      - ":"
                      - Ref: AWS::AccountId
                      - :repository/<my_ecr_repository>
            Version: "2012-10-17"
          PolicyName: policy_ecr
    Metadata:
      aws:cdk:path: <omitted>
  smmodel:
    Type: AWS::SageMaker::Model
    Properties:
      ExecutionRoleArn:
        Fn::GetAtt:
          - smmodelexecutionrole
          - Arn
      Containers:
        - Image: xxxxxxxxxxxx.dkr.ecr.<my_aws_region>.amazonaws.com/<my_ecr_repository>/multi-model:latest
          Mode: MultiModel
          ModelDataUrl: s3://<bucket_name>/deploy_multi_model_artifact/
      ModelName: MyModel
    Metadata:
      aws:cdk:path: <omitted>

在终端上运行cdk deploy时,发生以下错误:

3/6 | 7:56:58 PM | CREATE_FAILED | AWS::SageMaker::Model | sm_model (smmodel)
Could not access model data at s3://<bucket_name>/deploy_multi_model_artifact/. 
Please ensure that the role "arn:aws:iam::xxxxxxxxxxxx:role/<my_role>" exists 
and that its trust relationship policy allows the action "sts:AssumeRole" for the service principal "sagemaker.amazonaws.com". 
Also ensure that the role has "s3:GetObject" permissions and that the object is located in <my_aws_region>.
(Service: AmazonSageMaker; Status Code: 400; Error Code: ValidationException; Request ID: xxxxx)

我有:

  • 包含docker映像的ECR存储库
  • 一个S3存储桶,其中包含“文件夹”“ deploy_multi_model_artifact”内的模型工件(.tar.gz文件)

为了测试是否是IAM角色问题,我尝试用MultiModel替换SingleModel,并用s3://<bucket_name>/deploy_multi_model_artifact/替换s3://<bucket_name>/deploy_multi_model_artifact/one_of_my_artifacts.tar.gz,这样我就可以成功创建模型。然后,我猜测这与IAM无关,这与错误消息告诉我的是相反的(但我可能会犯错!)。

所以我想知道问题出在哪里。因为我已经使用boto3毫无问题地部署了此多模型端点,这更加令人困惑。

任何帮助将不胜感激!

((关于多模型端点部署:https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/multi_model_xgboost_home_value/xgboost_multi_model_endpoint_home_value.ipynb

deployment amazon-cloudformation amazon-iam amazon-sagemaker clouddevelopmentkit
1个回答
0
投票
问题是我忘记将SageMaker访问权限添加到IAM角色。我可以通过将SageMaker FullAccess托管策略添加到IAM角色来部署多模型端点。
© www.soinside.com 2019 - 2024. All rights reserved.