使用 dagshub.upload.Repo(USER_NAME,REPO_NAM) 添加数据

问题描述 投票:0回答:1

我想将原始数据集文件添加到我的 dagshub 存储库(我的第一个存储库,并且它与 MLflow 教程一起使用)

这句话给我带来了麻烦:

repo = dagshub.upload.Repo(USER_NAME,REPO_NAME)

repo.upload(local_path='data/winequality.txt',
            remote_path='data/raw/winequality.txt',
            commit_message='Added Raw Data',
            versioning='dvc')

这是我得到的错误:

Uploading files (1) to "USER_NAME/REPO_NAME"...
---------------------------------------------------------------------------
DagsHubAPIError                           Traceback (most recent call last)
<ipython-input-49-e8d1e8493248> in <cell line: 4>()
      2 repo = dagshub.upload.Repo(USER_NAME,REPO_NAME)
      3 
----> 4 repo.upload(local_path='data/winequality.txt',
      5             remote_path='data/raw/winequality.txt',
      6             commit_message='Added Raw Data',

2 frames
/usr/local/lib/python3.10/dist-packages/dagshub/upload/wrapper.py in upload(self, local_path, commit_message, remote_path, **kwargs)
    286         else:
    287             file_to_upload = DataSet.get_file(str(local_path), remote_path)
--> 288             self.upload_files([file_to_upload], commit_message=commit_message, **kwargs)
    289 
    290     def upload_files(

/usr/local/lib/python3.10/dist-packages/dagshub/upload/wrapper.py in upload_files(self, files, directory_path, commit_message, versioning, new_branch, last_commit, force)
    375             timeout=None,
    376         )
--> 377         self._log_upload_details(data, res, files)
    378 
    379         # The ETag header contains the hash of the uploaded commit,

/usr/local/lib/python3.10/dist-packages/dagshub/upload/wrapper.py in _log_upload_details(self, data, res, files)
    413             log_message(f"Got unknown successful status code {res.status_code}")
    414         else:
--> 415             raise determine_upload_api_error(res)
    416 
    417     def _poll_mirror_up_to_date(self):

DagsHubAPIError: file missing from storage:
Required resource is missing from the storage, is '' stored in your repository DagsHub storage?

Repo 文件结构如下所示:
本地盘:
根/
|...数据/
|... 酒质.txt

远程:
根/
|...数据/
|...原始/

请注意,“raw”是由 DVC 控制的版本,但 dagshub 文档显示这是执行此操作的方法:上传数据

不确定我错过了什么。

upload dvc dagshub
1个回答
0
投票

该问题似乎是由于缺少 DVC 跟踪文件导致的,这会阻止向目录添加新文件。要解决该问题,请运行以下代码:

pip install dvc "dvc[s3]"
(如果尚未安装)。

git clone https://dagshub.com/<user_name>/<repo_name>.git
cd <repo_name>

dvc remote add origin --local s3://dvc
dvc remote modify origin --local endpointurl https://dagshub.com/<user_name>/<repo_name>.s3

dvc remote modify origin --local access_key_id <your_token>
dvc remote modify origin --local secret_access_key <your_token>

配置完成后,运行以下命令:

mkdir -p data/raw
dvc commit data/raw.dvc
dvc push -r origin

然后运行您的代码。现在就可以工作了!

话虽这么说,这可能也是我们可以改进的地方,所以我会与工程团队分享!

谢谢你的提问:)

© www.soinside.com 2019 - 2024. All rights reserved.