执行任务中的Google Cloud AI平台错误

问题描述 投票:1回答:1

使用pythongoogleapiclientAPI,我们正在AI平台中创建工作。

from oauth2client.client import GoogleCredentials
import datetime

credentials = GoogleCredentials.get_application_default()
training_inputs = {'scaleTier':'CUSTOM','masterType':'complex_model_m',
        'packageUris':['package_bucket_file_path'],

        'pythonModule':'randomforest_trainer_RUL.train',
        'args':[
                '--trainFilePath', data[0],
                '--trainOutputPath', data[2],
                '--testFilePath', data[1],
                '--testOutputPath', data[3],
                '--target', target_label,
                '--bucket', BUCKET,
                '--expid', experiment_id
        ],
        'region': "region_of_bucket",
        'runtimeVersion':'1.14',
        'pythonVersion':'3.5'}

timestamp = datetime.datetime.now().strftime('%y%m%d_%H%M%S%f')
job_name = "job_"+experiment_id

## logging information
logging.info("Job Name:{}".format(job_name))
##
api = discovery.build('ml', 'v1', credentials=credentials,cache_discovery=False)

project_id = 'projects/{}'.format(PROJECT)
credentials  = GoogleCredentials.get_application_default()
request = api.projects().jobs().create(body=job_spec, parent=project_id)

它正在工作,我能够训练模型,进行测试和预测直到昨天。但是突然之间,我无法在AI平台中训练模型,而我得到的错误是

The replica master 0 exited with a non-zero status of 1. \nTraceback (most recent call last):\n  [...]\n  
    File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py", line 810, in ls\n    
    combined_listing = self._ls(path, detail) + self._ls(path + "/", detail)\n  
    File "</root/.local/lib/python3.5/site-packages/decorator.py:decorator-gen-12>", line 2, in _ls\n  
    File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py", line 50, in _tracemethod\n    
    return f(self, *args, **kwargs)\n  File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py", 
    line 820, in _ls\n    listing = self._list_objects(path)\n  File "</root/.local/lib/python3.5/site-packages/decorator.py:decorator-gen-5>", 
    line 2, in _list_objects\n  File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py", 
    line 50, in _tracemethod\nreturn f(self, *args, **kwargs)\n  File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py", 
    line 616, in _list_objects\n    listing = self._do_list_objects(path)\n  File "</root/.local/lib/python3.5/site-packages/decorator.py:decorator-gen-6>", 
    line 2, in _do_list_objects\n  File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py", 
    line 50, in _tracemethod\n    return f(self, *args, **kwargs)\n  File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py", 
    line 637, in _do_list_objects\n    maxResults=max_results,\n  File "</root/.local/lib/python3.5/site-packages/decorator.py:decorator-gen-2>", 
    line 2, in _call\n  File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py", 
    line 50, in _tracemethod\n    return f(self, *args, **kwargs)\n  File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py", 
    line 517, in _call\n    validate_response(r, path)\n  File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py", 
    line 171, in validate_response\n    raise IOError("Forbidden: %s\\n%s" % (path, msg))\nOSError: 
    Forbidden: https://www.googleapis.com/storage/v1/b/some-storage-bucket/o/\[email protected] 
    does not have serviceusage.services.use access to project 34XX12XX12X.\n\nTo find out more about why your job exited 
    please check the logs: https://console.cloud.google.com/logs/viewer?project=87XX90XX1XX&resource=ml_job%2Fjob_id%2Fjob_5de3592da3c3c541d73389er&advancedFilter=resource.type%3D%22ml_job%22%0Aresource.labels.job_id%3D%22job_5de3592da3c3c541d73389erce%22

我得到的错误是

[email protected] 
    does not have serviceusage.services.use access to project 34XX12XX12X
python google-cloud-ml
1个回答
1
投票

今天有确切的问题。正如尼克所说,这是GCSFS的新发行版问题。建议您不要使用pd.read_csv(gcs_path),而直接通过Tensorflow GFile函数从存储桶中读取CSV文件。

with tf.gfile.GFile(gcs_path) as f:
            if(opts):
                df = pd.read_csv(f, opts)
            else:
                df = pd.read_csv(f)
        return df

它将使您可以不中断地运行作业。

© www.soinside.com 2019 - 2024. All rights reserved.