在此 S3 回滚代码中是否有更有效的方法来过滤到特定键而不是前缀?

问题描述 投票:0回答:1

我有这段代码可以将 S3 对象回滚到特定版本,但我正在使用的方法中没有“key”选项,只有“prefix”。这是一个问题,因为在这个示例中,我最终将删除名为“问题副本”的对象的所有版本。所以如你所见,我必须在 python 中进行过滤,这似乎效率较低。

import boto3
import logging
from operator import attrgetter

logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler())

def rollback_object(bucket, object_key, version_id):
    """
    Rolls back an object to an earlier version by deleting all versions that
    occurred after the specified rollback version.

    Usage is shown in the usage_demo_single_object function at the end of this module.

    :param bucket: The bucket that holds the object to roll back.
    :param object_key: The object to roll back.
    :param version_id: The version ID to roll back to.
    """
    # Versions must be sorted by last_modified date because delete markers are
    # at the end of the list even when they are interspersed in time. (This is because
    # when we use the builtin sorted method, it ensures this to be true. [if we sort ascending, then
    # the delete markers would always be on top])
    versions = sorted(
        bucket.object_versions.filter(Prefix=object_key), #note "prefix" means will delete "questions copy" if newer than version we are rolling back to.
        key=attrgetter("last_modified"), #note this code will also delete the delete marker if it is newer than the version we are rolling back to.
        reverse=True,
    )
    filtered_versions = [v for v in versions if v.key == object_key]

    logger.debug(
        "Got versions:\n%s",
        "\n".join(
            [
                f"\t{version.version_id}, last modified {version.last_modified}"
                for version in filtered_versions
            ]
        ),
    )

    if version_id in [ver.version_id for ver in filtered_versions]:
        print(f"Rolling back to version {version_id}")
        for version in filtered_versions:
            if version.version_id != version_id:
                version.delete()
                print(f"Deleted version {version.version_id}")
            else:
                break

        print(f"Active version is now {bucket.Object(object_key).version_id}")
    else:
        raise KeyError(
            f"{version_id} was not found in the list of versions for " f"{object_key}."
        )


if __name__ == '__main__':
   mybucket = boto3.resource('s3').Bucket('scottedwards2000') 
   result = rollback_object(mybucket, 'questions', 'RQY0ebFXtUnm.A48N2I62CEmdu2QZGEO')
   print(result)
python amazon-web-services amazon-s3 boto3
1个回答
0
投票

那么,情况似乎是:

  • 无法请求特定对象的版本列表
  • 相反,需要使用
    .object_versions
    (或用于客户端调用的
    list_object_versions()
    ),它返回存储桶中所有对象的版本,但 可以过滤
  • 唯一的过滤形式是通过
    Prefix

你问的是效率。由于您的代码编写为仅回滚单个对象,因此实际上不可能减少对 AWS 的 API 调用数量。如果存储桶中的ALL版本被检索once,然后您可以根据返回的数据确定版本,那么它可能会更“高效”。同样,使用

delete_objects()
在一次 API 调用中删除多个对象版本可能比“每个要删除的对象版本一次 API 调用”更有效。

至于列表操作的效率,它运行得很快,因此改变它并没有真正的好处。

我注意到您对

filtered_versions
的检查避免了诸如
foo/lunch
的键也匹配
foo/lunchtime
之类的情况。很高兴您意识到了潜在的问题。

可以被认为“更有效”的“替代方法”是“将所需的先前版本复制到同一个密钥”,这将导致该数据成为对象的“当前”版本。因此,您不是“删除该版本以来的内容”,而是“将该版本复制为当前版本”。这样,您就永远不会丢失任何版本,甚至可以稍后“回滚”到新版本!

© www.soinside.com 2019 - 2024. All rights reserved.