将 list 转换为 numpy 数组时,请求的数组在 1 维后具有不均匀的形状

问题描述 投票:0回答:1

我正在尝试使用名为

load_data_new
的函数加载训练和测试数据,该函数从
topomaps/
文件夹读取数据并从
labels/
文件夹读取标签。它们都包含
.npy
文件。

具体

topomaps/
文件夹包含:

例如,

s01_trial03.npy
包含 128 个地形图,而
s01_trial12
包含 2944 个地形图(也就是说,它们的形状可能不同!)

labels/
文件夹包含:

此外,训练数据必须仅包含标签为0的地形图(而测试数据可以包含标签为0、1或2的地形图)。这是我的代码:

def load_data_new(topomap_folder: str, labels_folder: str, test_size: float = 0.2) -> tuple:
    """
    Load and pair topomap data and corresponding label data from separate folders
    :param topomap_folder: (str) The path to the folder containing topomaps .npy files
    :param labels_folder: (str) The path to the folder containing labels .npy files
    :param test_size: (float) The proportion of data to be allocated to the testing set (default is 0.2)
    :return: (tuple) Two tuples, each containing a topomap ndarray and its corresponding label 1D-array.

    Note:
        The function assumes that the filenames of the topomaps and labels are in the same order.
        It also assumes that there is a one-to-one correspondence between the topomap files and the label files.
        If there are inconsistencies between the shapes of the topomap and label files, it will print a warning message.

    Example:
        topomap_folder = "topomaps"
        labels_folder = "labels"
        (x_train, y_train), (x_test, y_test) = load_data_new(topomap_folder, labels_folder, test_size=0.2)
    """
    topomap_files = os.listdir(topomap_folder)
    labels_files = os.listdir(labels_folder)

    # Sort the files to ensure the order is consistent
    topomap_files.sort()
    labels_files.sort()

    labels = []
    topomaps = []

    for topomap_file, label_file in zip(topomap_files, labels_files):
        if topomap_file.endswith(".npy") and label_file.endswith(".npy"):
            topomap_path = os.path.join(topomap_folder, topomap_file)
            label_path = os.path.join(labels_folder, label_file)

            topomap_data = np.load(topomap_path)
            label_data = np.load(label_path)

            if topomap_data.shape[0] != label_data.shape[0]:
                raise ValueError(f"Warning: Inconsistent shapes for {topomap_file} and {label_file}")

            topomaps.append(topomap_data)
            labels.append(label_data)

    x = np.array(topomaps)
    y = np.array(labels)

    # Training set only contains images whose label is 0 for anomaly detection
    train_indices = np.where(y == 0)[0]
    x_train = x[train_indices]
    y_train = y[train_indices]

    # Split the remaining data into testing sets
    remaining_indices = np.where(y != 0)[0]
    x_remaining = x[remaining_indices]
    y_remaining = y[remaining_indices]
    _, x_test, _, y_test = train_test_split(x_remaining, y_remaining, test_size=test_size)

    return (x_train, y_train), (x_test, y_test)



(x_train, y_train), (x_test, y_test) = load_data_new("topomaps", "labels")

但不幸的是我收到了这个错误:

Traceback (most recent call last):
  File "/Users/alex/PycharmProjects/VAE-EEG-XAI/vae.py", line 574, in <module>
    (x_train, y_train), (x_test, y_test) = load_data_new("topomaps", "labels")
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alex/PycharmProjects/VAE-EEG-XAI/vae.py", line 60, in load_data_new
    x = np.array(topomaps)
        ^^^^^^^^^^^^^^^^^^
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (851,) + inhomogeneous part.

这表明

topomaps
列表中的元素具有不同的形状,导致在尝试将其转换为 NumPy 数组时出现不均匀的数组。出现此错误的原因是拓扑图列表中的各个拓扑图具有不同的形状,而 NumPy 数组需要形状一致的元素。

我该如何解决?

python numpy machine-learning scikit-learn numpy-ndarray
1个回答
0
投票

我只是这样解决了这个问题:

def load_data(topomaps_folder: str, labels_folder: str, test_size=0.2) -> tuple:
    x, y = _create_dataset(topomaps_folder, labels_folder)

    # Training set only contains images whose label is 0 for anomaly detection
    train_indices = np.where(y == 0)[0]
    x_train = x[train_indices]
    y_train = y[train_indices]

    # Split the remaining data into testing sets
    remaining_indices = np.where(y != 0)[0]
    x_remaining = x[remaining_indices]
    y_remaining = y[remaining_indices]
    _, x_test, _, y_test = train_test_split(x_remaining, y_remaining, test_size=test_size)

    return (x_train, y_train), (x_test, y_test)


def _create_dataset(topomaps_folder, labels_folder):
    topomaps_files = os.listdir(topomaps_folder)
    labels_files = os.listdir(labels_folder)

    topomaps_files.sort()
    labels_files.sort()

    x = []
    y = []

    n_files = len(topomaps_files)

    for topomaps_file, labels_file in tqdm(zip(topomaps_files, labels_files), total=n_files, desc="Loading data set"):
        topomaps_array = np.load(f"{topomaps_folder}/{topomaps_file}")
        labels_array = np.load(f"{labels_folder}/{labels_file}")
        if topomaps_array.shape[0] != labels_array.shape[0]:
            raise Exception("Shapes must be equal")
        for i in range(topomaps_array.shape[0]):
            x.append(topomaps_array[i])
            y.append(labels_array[i])

    x = np.array(x)
    y = np.array(y)

    return x, y
© www.soinside.com 2019 - 2024. All rights reserved.