使用自定义数据集的 Sagemaker 训练作业 ErrorMessage“FileNotFoundError:没有此类文件或目录:'/opt/ml/input/data/training/train/data.csv'

问题描述 投票:0回答:1

算法错误:ExecuteUserScriptError:ExitCode 1 ErrorMessage“FileNotFoundError:[Errno 2]没有这样的文件或目录:'/opt/ml/input/data/training/train/data.csv'”命令“/opt/conda/bin/python3 .8transfer_learning.py --attn_dropout 0.17493582725215484 --batch_size 128 --frac_shared_embed 0.25 --input_dim 32 --learning_rate 0.008165108541098379 --mlp_dropout 0.2870045773619837 --n_blocks 4 --n_纪元 80 --耐心 15",退出代码:1

from sagemaker.estimator import Estimator
from sagemaker.utils import name_from_base

training_job_name = name_from_base(f"jumpstart-{train_model_id}-training")

# Create SageMaker Estimator instance
tabular_estimator = Estimator(
    role=aws_role,
    image_uri=train_image_uri,
    source_dir=train_source_uri,
    model_uri=train_model_uri,
    entry_point="transfer_learning.py",
    instance_count=1,
    instance_type=training_instance_type,
    max_run=360000,
    hyperparameters=hyperparameters,
    output_path=s3_output_location,
)

if use_amt:

    tuner = HyperparameterTuner(
        tabular_estimator,
        "r2",
        hyperparameter_ranges,
        [{"Name": "r2", "Regex": "metrics={'r2': (\\S+)}"}],
        max_jobs=10,  # increase the max_jobs to achieve better performance from hyperparameter tuning
        max_parallel_jobs=2,
        objective_type="Maximize",
        base_tuning_job_name=training_job_name,
    )

    tuner.fit({"training": training_dataset_s3_path}, logs=True)

else:
    # Launch a SageMaker Training job by passing s3 path of the training data
    tabular_estimator.fit(
        {"training": training_dataset_s3_path}, logs=True, job_name=training_job_name
    )
python tensorflow amazon-s3 amazon-sagemaker
1个回答
0
投票

正如评论中所述,您可能在

S3
中提供了不正确的文件路径。

一些罪魁祸首可能包括:

  • 训练数据的格式与
    csv
  • 不同
  • data.csv
    被放置在不同的文件夹中,可能位于
    training_dataset_s3_path
    而不是
    training_dataset_s3_path/train
  • 路径或文件名中的拼写错误
© www.soinside.com 2019 - 2024. All rights reserved.