我有两个不同的文件夹,其中包含我的图像和随机图像,如何将其拆分以在 cnn 模型中进行训练和测试
我尝试将其作为训练数据和测试数据,但我不知道该怎么做。我需要知道从头开始构建我们自己的数据集的过程是什么。
考虑到您只是询问如何在训练和测试中分割图像,以下是您可以做到的方法:
import os
import shutil
import random
# Define paths for source and destination folders
source_folder = 'path/to/source_folder' # Replace with your source folder path
train_folder = 'path/to/train_folder' # Replace with your train folder path
test_folder = 'path/to/test_folder' # Replace with your test folder path
# Define the ratio for splitting (e.g., 80% train, 20% test)
train_ratio = 0.8
# Create destination folders if they don't exist
os.makedirs(train_folder, exist_ok=True)
os.makedirs(test_folder, exist_ok=True)
# List all image files in the source folder
image_files = [file for file in os.listdir(source_folder) if file.endswith(('jpeg', 'png', 'jpg'))]
# Shuffle the list to randomize the selection
random.shuffle(image_files)
# Calculate the number of files for training
train_count = int(len(image_files) * train_ratio)
# Split files into train and test sets
train_images = image_files[:train_count]
test_images = image_files[train_count:]
# Move images to the respective folders
for image in train_images:
src_path = os.path.join(source_folder, image)
dst_path = os.path.join(train_folder, image)
shutil.copyfile(src_path, dst_path)
for image in test_images:
src_path = os.path.join(source_folder, image)
dst_path = os.path.join(test_folder, image)
shutil.copyfile(src_path, dst_path)
附注您仍然需要根据您的 CNN 模型使用适当的注释格式标记不同类别的数据。另外,这只是将您的数据分割到训练测试目录中,而不是读取您的图像以输入到您的模型中。