kaggle下载数据集时出现“MemoryError”如何解决?

问题描述 投票:0回答:1

我想从kaggle下载数据集,但是当我在本地计算机上运行它时,它崩溃了,这是我的代码:

api = kaggle.KaggleApi(json_str)
    api.authenticate()
    api.datasets_download(owner_slug='headwater', dataset_slug='Camels')

这是崩溃报告:

test_dload_archive.py:8: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
..\venv\lib\site-packages\kaggle\api\kaggle_api.py:1494: in datasets_download
    (data) = self.datasets_download_with_http_info(owner_slug, dataset_slug, **kwargs)  # noqa: E501
..\venv\lib\site-packages\kaggle\api\kaggle_api.py:1563: in datasets_download_with_http_info
    return self.api_client.call_api(
..\venv\lib\site-packages\kaggle\api_client.py:329: in call_api
    return self.__call_api(resource_path, method,
..\venv\lib\site-packages\kaggle\api_client.py:161: in __call_api
    response_data = self.request(
..\venv\lib\site-packages\kaggle\api_client.py:351: in request
    return self.rest_client.GET(url,
..\venv\lib\site-packages\kaggle\rest.py:247: in GET
    return self.request("GET", url,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <kaggle.rest.RESTClientObject object at 0x000001B1FAE01D80>
method = 'GET'
url = 'https://www.kaggle.com/api/v1/datasets/download/headwater/Camels'
query_params = []
headers = {'Accept': 'file', 'User-Agent': 'Swagger-Codegen/1/python'}
body = None, post_params = {}, _preload_content = True, _request_timeout = None
……
            if six.PY3:
>               r.data = r.data.decode('utf8')
E               MemoryError

..\venv\lib\site-packages\kaggle\rest.py:235: MemoryError

我认为这是因为解压大文件会占用内存,但是如何解决呢?

更新: 当我在 Linux 中时,崩溃看起来像这样:

            if six.PY3:
>               r.data = r.data.decode('utf8')
E               UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcb in position 14: invalid continuation byte
python io request out-of-memory kaggle
1个回答
0
投票

注意rest.py中的这一行:

r.data = r.data.decode('utf8')

这是非常幼稚的,对于这个特定的数据集来说,这是完全错误的。

您可以使用 cp037 解码此数据集,但为此您需要适当地编辑rest.py

© www.soinside.com 2019 - 2024. All rights reserved.