使用基于会话的令牌身份验证将数据从hdfs移动到s3

问题描述 投票:2回答:1

在将数据从hdfs移动到S3时,有人可以帮助我进行身份验证。要连接到S3,我使用aws_key_gen(access_key,secret_key和基于会话的令牌)生成基于会话的凭据

我测试过,distcp在永久访问和密钥的情况下工作正常,但是会给基于会话的密钥带来问题。此外,我使用python测试了基于会话的凭据,并能够列出目录。

附件是代码(密钥和存储桶名称已更改)。

-----------------------------------
--Below python code working fine
----------------------------------- 
python
import boto3
session = boto3.Session(
    aws_access_key_id='ASIA123456789G7MPE3N',
    aws_secret_access_key='Nsz7d05123o456o789o0UdVRQWa7y7i3MED2L6/u',
    aws_session_token='FQoGZXIvYXdzEPr//////////wEa123o345o567ohytzmnAAj7YnHgnjfhAmrsdUmTFRZSoZmgeIKKF3dY+/ZzQadteRRSitmq+/llvnWBlA1WHfneMaN/yOawAAO2aSLjkLXZsC2G0Gtt+dcmS9zhy8ye+FfDppODc3yiBoYfpmOuXfMqbyDt3nnYe3Hlq44DWS7wqIb72X+s2ebiNghNWxyD1VJM1qT68/OIUYrjarNDGWhDCKRU21Sjqk4FWgwSUX5f5cIoTwvnhAkFwwD8TIRt5sFgMEfDrBjIj22oILF5xrfaDRr3hc3dLKb7jZUxMWWSCbQZXA5sGE78/UazA8ufEAKPVkWdYi+q39RvR9K2mjrWD1jc6cCrj+ScWCJ+CfWcoVev/QtHqu4WHYfORfinuZUEHLOTIwU/Gz83UdQ1KMvi39wF'
)
s3=session.resource('s3')
my_bucket = s3.Bucket('mybucket')
for object in my_bucket.objects.all():
    print(object)   


-----------------------------------------
--Below distcp is giving forbidden error
-----------------------------------------

AWS_ACCESS_KEY_ID='ASIA123456789G7MPE3N'
AWS_SECRET_ACCESS_KEY='Nsz7d05123o456o789o0UdVRQWa7y7i3MED2L6/u'
AWS_SESSION_TOKEN='FQoGZXIvYXdzEPr//////////wEa123o345o567ohytzmnAAj7YnHgnjfhAmrsdUmTFRZSoZmgeIKKF3dY+/ZzQadteRRSitmq+/llvnWBlA1WHfneMaN/yOawAAO2aSLjkLXZsC2G0Gtt+dcmS9zhy8ye+FfDppODc3yiBoYfpmOuXfMqbyDt3nnYe3Hlq44DWS7wqIb72X+s2ebiNghNWxyD1VJM1qT68/OIUYrjarNDGWhDCKRU21Sjqk4FWgwSUX5f5cIoTwvnhAkFwwD8TIRt5sFgMEfDrBjIj22oILF5xrfaDRr3hc3dLKb7jZUxMWWSCbQZXA5sGE78/UazA8ufEAKPVkWdYi+q39RvR9K2mjrWD1jc6cCrj+ScWCJ+CfWcoVev/QtHqu4WHYfORfinuZUEHLOTIwU/Gz83UdQ1KMvi39wF'
AWS_CREDENTIALS_PROVIDER='org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider'
hadoop distcp -Dfs.s3a.access.key="${AWS_ACCESS_KEY_ID}" -Dfs.s3a.secret.key="${AWS_SECRET_ACCESS_KEY}" -Dfs.s3a.session.token="${AWS_SESSION_TOKEN}" 1.csv s3a://mybucket/temp
hadoop distcp -Dfs.s3a.access.key="${AWS_ACCESS_KEY_ID}" -Dfs.s3a.secret.key="${AWS_SECRET_ACCESS_KEY}" -Dfs.s3a.session.token="${AWS_SESSION_TOKEN}" -Dfs.s3a.aws.credentials.provider="${AWS_CREDENTIALS_PROVIDER}" 1.csv s3a://mybucket/temp
amazon-s3 hdfs hadoop2 distcp s3distcp
1个回答
1
投票
  1. 会话密钥支持仅适用于Hadoop 2.8,因此如果您使用的是早期版本:没有快乐。
  2. 在Hadoop 2.8+上应该可行。在担心distcp之前尝试使用cloudstorehadoop fs命令,因为这是额外的麻烦
© www.soinside.com 2019 - 2024. All rights reserved.