AWS Lambda Python Docker 容器:任务在 600.01 秒后超时

问题描述 投票:0回答:0

我在 lambda 上(不是在 VPC 中)运行一个 docker 容器,它向 api 发出获取请求并在 s3 中转换和加载数据。但是,在运行该功能时,我总是会收到错误消息:

OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k
START RequestId: 0bb8da57-8fb8-406e-9f97-87f85bc0a68e Version: $LATEST
2023-04-23T19:02:31.782Z 0bb8da57-8fb8-406e-9f97-87f85bc0a68e Task timed out after 600.01 seconds

END RequestId: 0bb8da57-8fb8-406e-9f97-87f85bc0a68e
REPORT RequestId: 0bb8da57-8fb8-406e-9f97-87f85bc0a68e  Duration: 600007.09 ms  Billed Duration: 609569 ms  Memory Size: 256 MB Max Memory Used: 154 MB Init Duration: 9560.94 ms   

这是在 lambda 上运行的容器的 Dockerfile 的样子:

FROM public.ecr.aws/lambda/python:3.9
  
COPY stepstone_scraper.py ${LAMBDA_TASK_ROOT}
COPY requirements.txt ./

RUN pip install -r requirements.txt -t "${LAMBDA_TASK_ROOT}"

RUN chmod 644 $(find . -type f)
RUN chmod 755 $(find . -type d)

CMD ["stepstone_scraper.main"]

这是 python 文件中的主要函数:

def main(event, context):    

    job_df = pd.DataFrame(columns=COLUMNS)
    try:
        for i in range(get_pages()):
            cards = extract_job_cards('data-engineer', 1, i)
            extract_and_append_skills(cards, job_df)
        job_df = job_df[~job_df.duplicated(subset=['HREF_LINK'])].copy()
        upload_to_s3(job_df,"jobs.parquet","job-scraping", f"job_data/{str(date.today())[0:4]}/{str(date.today())[5:7]}/{str(date.today())[8:10]}")
        return "Success"
    except:
        upload_to_s3(job_df,"jobs.parquet","job-scraping", f"job_data/{str(date.today())[0:4]}/{str(date.today())[5:7]}/{str(date.today())[8:10]}")
        return "Failure"

这就是我在 terraform 中构建 lambda 函数的方式:

resource "aws_lambda_function" "job_scraping_function" {
package_type  = "Image"
image_uri     = "${aws_ecr_repository.scraping_repo.repository_url}:latest"
function_name                  = "job_scraping_function"
role                           = aws_iam_role.lambda_s3_role.arn
handler                        = "index.lambda_handler"
runtime                        = "python3.9"
memory_size  = 256
timeout      = 600
depends_on = [null_resource.docker_build_and_push]
}

当我在本地运行代码时,一切正常,大约需要 30 到 60 秒。 分配给代码可以由 lambda 承担并具有所需访问权限的功能的角色。

有人知道这里出了什么问题吗?

python amazon-web-services docker web-scraping aws-lambda
© www.soinside.com 2019 - 2024. All rights reserved.