这是一个函数,它遍历“CATEGORIES”数组中命名的 6 个文件夹,并将它们转换为张量后,返回列表。
但是,当我使用不同的目录作为参数运行相同的函数时,它会给出相同的输出。
代码如下:
def preprocess(directory, img_size):
X = []
Y = []
CATEGORIES = ['buildings', 'forest', 'glacier', 'mountain', 'sea', 'street']
label = 0
for category in CATEGORIES:
path = os.path.join(DIRECTORY, category)
for img in os.listdir(path):
tensor = cv2.imread(os.path.join(path, img))
tensor = cv2.resize(tensor, (img_size, img_size))
X.append(tensor)
Y.append(label)
label +=1
return X, Y
TRAIN_DIRECTORY = 'Data/seg_train'
TEST_DIRECTORY = 'Data/seg_test'
train_X, train_Y = preprocess(TRAIN_DIRECTORY, 150)
test_X, test_Y = preprocess(TEST_DIRECTORY, 150)
这是我比较训练和测试数组时的输出:
print("Train array size: ",len(train_Y), "\nTest array size: ",len(test_Y))
Output:
Train array size: 14034
Test array size: 14034
在您的代码中,您使用的是 DIRECTORY,它会覆盖最后一个值,从而为您提供相同数量的文件,将其替换为您在函数中声明的参数
path = os.path.join(directory, category)
这应该可以解决您的问题,因为 len 是相同数量的文件。
在这里,这段代码应该可以工作:
def preprocess(directory, img_size):
X = []
Y = []
CATEGORIES = ['buildings', 'forest', 'glacier', 'mountain', 'sea', 'street']
label = 0
path = os.path.join(directory, CATEGORIES[label])
while os.path.exists(path):
for img in os.listdir(path):
tensor = cv2.imread(os.path.join(path, img))
tensor = cv2.resize(tensor, (img_size, img_size))
X.append(tensor)
Y.append(label)
label += 1
path = os.path.join(directory, CATEGORIES[label])
return X, Y
TRAIN_DIRECTORY = 'Data/seg_train'
TEST_DIRECTORY = 'Data/seg_test'
train_X, train_Y = preprocess(TRAIN_DIRECTORY, 150)
test_X, test_Y = preprocess(TEST_DIRECTORY, 150)
如果它不起作用或者它不是您想要的,请告诉我,以便我重写它