我正在使用
Pydantic
来验证并在 S3 Event
函数中输入传入的 AWS Lambda
。
事件看起来像这样(仅包括相关位):
{
"Records": [
{
"s3": {
"bucket": {
"name": "my-bucket"
},
"object": {
"key": "MYKEY%28CSV%29/XXXX.CSV"
}
}
}
]
}
我这样定义我的
Model
来获取相关信息。
from pydantic import BaseModel
class ObjectInfo(BaseModel):
key: str
class BucketInfo(BaseModel):
name: str
class S3Schema(BaseModel):
bucket: BucketInfo
object: ObjectInfo
class Record(BaseModel):
s3: S3Schema
class DeletionEvent(BaseModel):
Records: list[Record]
def handler(event: dict, _):
eventTyped = DeletionEvent(**event)
return True
现在的问题是
key
的正确值应该是 MYKEY(CSV)/XXXX.CSV
,而不是 MYKEY%28CSV%29/XXXX.CSV
。我通常使用 urllib.parse.unquote_plus
来解码表示特殊字符的 %XX
位来解决此问题。我想我可以定义一个自定义的 decoder 但这似乎有点矫枉过正。
有什么方法可以让
pydantic
为我进行解码吗?它有一堆用于处理 URL 的类,但我没有看到任何有关自行解码 URL 编码字符串的内容。
。还是感觉
Pydantic
应该有更好的办法。这是我找到的解决方案:from urllib.parse import unquote
from typing_extensions import Annotated
from pydantic import (
BaseModel,
EncodedStr,
EncoderProtocol
)
# This is the class that will be used to "decode" my URL string
class MyEncoder(EncoderProtocol):
@classmethod
def decode(cls, data: bytes) -> bytes:
# We have to use unquote rather than unquote_plus because only unquote can work with bytes objects.
# This may be a limitation if your URL string contains encoded spaces.
return str.encode(unquote(data))
MyEncodedStr = Annotated[str, EncodedStr(encoder=MyEncoder)]
class ObjectInfo(BaseModel):
key: MyEncodedStr
class BucketInfo(BaseModel):
name: str
class S3Schema(BaseModel):
bucket: BucketInfo
object: ObjectInfo
class Record(BaseModel):
s3: S3Schema
class DeletionEvent(BaseModel):
Records: list[Record]
event = {
"Records": [
{
"s3": {
"bucket": {
"name": "my-bucket"
},
"object": {
"key": "MYKEY%28CSV%29/XXXX.CSV"
}
}
}
]
}
eventTyped = DeletionEvent(**event)
这可以正确地将 URL 编码字符串
"MYKEY%28CSV%29/XXXX.CSV"
转换为普通字符串
"MYKEY(CSV)/XXXX.CSV"
。我的理解:
Pydantic
str
转换为 bytes
。MyEncoder.decode
bytes
对象上调用。urllib.parse.unquote
str
。Pydantic
decode
返回一个 bytes
对象。Pydantic
bytes
对象转换回 str