无法打开由snakemake管道生成的pickle文件

问题描述 投票:0回答:2

我有一个由多个步骤组成的数据分析管道。我已经生成了一个 Snakemake 管道(对我来说是新的),每个任务的输出(以及下一个任务的输入)是一个包含 DataFrame 或 DataFrame 列表的 pickle 文件。一切都很好,除了我无法手动打开泡菜文件。值得注意的是,该管道使用专用的 conda 环境。

import _pickle
with open("testb/first/out/stacks.pkl", "rb") as f:
    data = _pickle.load(f)

我收到此错误:

AttributeError: Can't get attribute '_unpickle_block' on <module 'pandas._libs.internals' from 'C:\\Users\\sebde\\anaconda3\\envs\\dbm\\lib\\site-packages\\pandas\\_libs\\internals.cp39-win_amd64.pyd'

Python 3.10.2, Snakemake-minimal 7.0.4(根据文档,我使用的是 Windows), 熊猫1.4.1

python pandas snakemake
2个回答
0
投票

一个简单的尝试(但可能不起作用)是使用

pickle
而不是
_pickle

# the "as" part is to avoid adjust downstream code
# but if this is not a concern a regular import is better
# (i.e. "import pickle")
import pickle as _pickle

0
投票

也许,在保存文件和加载文件之间,您可能会更好地匹配

pandas
的版本。

我也遇到类似的错误

AttributeError: Can't get attribute '_unpickle_block'
这样。

import joblib
import pandas as pd  # 1.4.3


df = pd.DataFrame({"a": 1, "b": 2})
joblib.dump(df, "train.dump", compress=True)

# --------------------------------

# When loading this, pandas 1.3.5 is installed.
import joblib


def load(path):
    return joblib.load(path)


train_data = load("train.dump")
Traceback (most recent call last):
  File "/home/ec2-user/work/tools/script.py", line 45, in <module>
    train_data = load("train.dump")
  File "/home/ec2-user/work/tools/script.py", line 16, in load
    return joblib.load(path)
  File "/home/ec2-user/work/.venv/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 658, in load
    obj = _unpickle(fobj, filename, mmap_mode)
  File "/home/ec2-user/work/.venv/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 577, in _unpickle
    obj = unpickler.load()
  File "/home/ec2-user/.pyenv/versions/3.10.12/lib/python3.10/pickle.py", line 1213, in load
    dispatch[key[0]](self)
  File "/home/ec2-user/.pyenv/versions/3.10.12/lib/python3.10/pickle.py", line 1538, in load_stack_global
    self.append(self.find_class(module, name))
  File "/home/ec2-user/.pyenv/versions/3.10.12/lib/python3.10/pickle.py", line 1582, in find_class
    return _getattribute(sys.modules[module], name)[0]
  File "/home/ec2-user/.pyenv/versions/3.10.12/lib/python3.10/pickle.py", line 331, in _getattribute
    raise AttributeError("Can't get attribute {!r} on {!r}"
AttributeError: Can't get attribute '_unpickle_block' on <module 'pandas._libs.internals' from '/home/ec2-user/work/.venv/lib/python3.10/site-packages/pandas/_libs/internals.cpython-310-x86_64-linux-gnu.so'>

加载文件时我正在使用

pandas
1.3.5
。 但实际上,我在文件“train.dump”中保存了一个用
pandas
1.4.3
制作的 DataFrame。

所以我在重新安装 pandas

joblib.load
后重试执行
1.4.3
。 因此,我可以加载文件“train.dump”!

请参考另一个建议。 属性错误:无法获取属性“_unpickle_block”

© www.soinside.com 2019 - 2024. All rights reserved.