我正在尝试训练图像比较模型的问题。我将其简化为以下问题。
我向模型提供了成对的图像 (3x128x128)。图像要么全黑,要么全白。该模型通过单独的 resnet 模型获取两个图像,并将输出连接起来,然后通过完全连接的层。如果两个图像颜色相同(均为黑色或均为白色),则应返回 1.0,否则返回 0.0。然而,即使这个任务应该很简单,模型也会收敛到始终分配 ~0.5。
型号:
class TemplateEvaluator(nn.Module):
def __init__(self, q_encoder=resnet18(), t_encoder=resnet18()):
super(TemplateEvaluator, self).__init__()
self.q_encoder = q_encoder
self.t_encoder = t_encoder
# Set requires_grad to True to train resnet
for param in self.q_encoder.parameters():
param.requires_grad = True
for param in self.t_encoder.parameters():
param.requires_grad = True
self.fc = nn.Sequential(
nn.Linear(2000, 1),
nn.Sigmoid()
)
def forward(self, data):
q = data[0]
t = data[1]
# If singular images:
if q.ndim == 3: q = q.unsqueeze(0)
if t.ndim == 3: t = t.unsqueeze(0)
q = self.q_encoder(q)
t = self.t_encoder(t)
res = self.fc(torch.cat([q,t],-1)).flatten()
return res
数据加载器:
class BlackOrWhiteDataset(Dataset):
def __init__(self):
self.tf = transforms.ToTensor()
def __getitem__(self, i):
black = (255,255,255)
white = (0,0,0)
x1_col = black if (np.random.random() > 0.5) else white
x2_col = black if (np.random.random() > 0.5) else white
y = torch.tensor(x1_col == x2_col, dtype=torch.float)
x1 = Image.new('RGB', (img_width,img_width), x1_col)
x2 = Image.new('RGB', (img_width,img_width), x2_col)
return self.tf(x1), self.tf(x2), y
def __len__(self):
return 100
def create_data_loader(dataset, batch_size, verbose=True):
dl = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True,
collate_fn=lambda x: tuple(x_.to(device) for x_ in default_collate(x)))
return dl
培训:
t_eval = TemplateEvaluator().to(device)
opt = optim.SGD(t_eval.parameters(), lr=0.001, momentum=0.01)
epochs = 10
losses = []
for epoch in tqdm(range(epochs)):
t_eval.train()
for X1, X2, Y in dl:
Y_pred = t_eval(torch.stack([X1,X2]))
loss = F.mse_loss(Y_pred,Y)
opt.zero_grad()
loss.backward()
opt.step()
sys.stdout.write('\r')
sys.stdout.write("loss: %f" % loss.item())
sys.stdout.flush()
losses.append(loss.item())
plt.plot(losses)
plt.ylim(0,1)
结果:
0%| | 0/10 [00:00<?, ?it/s]
loss: 0.259106
10%|█ | 1/10 [00:01<00:13, 1.54s/it]
loss: 0.241787
20%|██ | 2/10 [00:02<00:11, 1.40s/it]
loss: 0.258519
30%|███ | 3/10 [00:04<00:09, 1.36s/it]
loss: 0.250100
40%|████ | 4/10 [00:05<00:08, 1.35s/it]
loss: 0.257565
50%|█████ | 5/10 [00:06<00:06, 1.35s/it]
loss: 0.264662
60%|██████ | 6/10 [00:08<00:05, 1.35s/it]
loss: 0.246792
70%|███████ | 7/10 [00:09<00:04, 1.34s/it]
loss: 0.260988
80%|████████ | 8/10 [00:10<00:02, 1.34s/it]
loss: 0.241590
90%|█████████ | 9/10 [00:12<00:01, 1.34s/it]
loss: 0.250159
100%|██████████| 10/10 [00:13<00:00, 1.35s/it]
案例示例:
t_eval.eval()
for X1, X2, Y in dl:
view([X1[0],X2[0]])
print(Y[0].item())
print(t_eval(torch.stack([X1[0],X2[0]])).item())
break
给出:
或:
当将“Y”设置为零时,模型确实收敛,使得 Y_pred 接近零。所以优化器正在工作。 当设置“Y”来指示第一张图像是否为黑色时,模型确实会按预期收敛。第二张图片也是如此。因此模型可以单独解释两个图像。
因此,模型似乎无法合并两个输入中的信息,我不明白为什么。
用于检查相等性的模型有一个密集层。然而,单层感知器无法学习 XOR 函数,并且通过扩展 XNOR(即相等),这是早期机器学习历史中非常著名的结果。