背景
我正在试验一个模型,该模型应该将已知对象的查询图像与方向可能相同的相应模板图像进行匹配。 (我将处理对称对象和严重遮挡,因此这种关系通常是一对多。)
我给模型一个图像对作为输入(查询图像+候选模板图像),如果模型认为对象没有相同的方向,我期望 0.0;如果模型认为它们确实具有相同的方向,我期望 1.0。 (我使用 L1_loss 进行训练。)
我用合成数据批量训练这个模型,其中我为每个查询图像提供:
问题
奇怪的是,当负模板是正模板的平面内旋转时,模型的训练和表现几乎令人怀疑(正例的平均分类 = ~.99,负例 = ~.1) 。但是,当负模板是完全随机的模板且具有任何 3D 对象方向时,模型会遇到很大困难(正例的平均分类 = ~.75,负例 = ~.5)。这对我来说似乎很奇怪,因为 pos 之间应该有更多差异。和否定。因此应该更容易区分他们。
代码
型号:
class TemplateEvaluator(nn.Module):
def __init__(self, q_encoder=resnet18(weights=ResNet18_Weights.IMAGENET1K_V1), t_encoder=resnet18(weights=ResNet18_Weights.IMAGENET1K_V1)):
super(TemplateEvaluator, self).__init__()
self.q_encoder = q_encoder
self.t_encoder = t_encoder
self.fc = nn.Sequential(
nn.Linear(2000, 1),
nn.Sigmoid()
)
def forward(self, data):
q = data[0]
t = data[1]
q = self.q_encoder(q)
t = self.t_encoder(t)
res = self.fc(torch.cat([q,t],-1))
return res
训练步骤:
def template_eval_train_step(iteration, models, data, codebook, opts=None, show=False, metric_label=''):
# Get query image, associated codebook template ID, and associated orientation
q_img, cb_id, rot = data
n = q_img.shape[0]
t_eval = models[0]
# Get random template IDs
cb_id_rand = np.random.choice(codebook["size"],n)
# Get associated and random template images
t_img = torch.stack([cb_get_img(i,codebook) for i in cb_id]).to(device)
# Uncomment to use random template as neg cases
t_img_rand = torch.stack([cb_get_img(i,codebook) for i in cb_id_rand]).to(device)
# Uncomment to use in-plane rotations of pos template as neg cases
# t_img_rand = torch.stack([rotate_image_tensor(y,np.random.random()*360) for y in t_img])
# Cases with similar template image ('Positive')
p_cases = torch.stack([q_img.permute(0, 3, 1, 2),t_img.permute(0, 3, 1, 2)])
# Cases with random template image ('Negative')
n_cases = torch.stack([q_img.permute(0, 3, 1, 2),t_img_rand.permute(0, 3, 1, 2)])
# Mix together for 50/50 distribution in batch
mixed_cases = torch.concat([p_cases,n_cases], 1)
# Run model
c = t_eval(mixed_cases)
# Get classification for pos and neg cases
p_cls = c[:n]
n_cls = c[n:]
# Compute loss
p_loss = F.l1_loss(p_cls, torch.ones_like(p_cls, requires_grad=True))
n_loss = F.l1_loss(n_cls, torch.zeros_like(n_cls, requires_grad=True))
loss = (p_loss + n_loss)/2
# Visualise pos and neg case at i=0
if show:
i=0
view([q_img[i].detach().cpu().numpy(), t_img[i].detach().cpu().numpy()])
print("p_cls:",p_cls[i].detach().cpu().numpy())
view([q_img[i].detach().cpu().numpy(), t_img_rand[i].detach().cpu().numpy()])
print("n_cls:",n_cls[i].detach().cpu().numpy())
# Run optimizer (if given)
if opts is not None:
opts[0].zero_grad()
loss.backward()
# Print gradient info
if show:
t_eval.cpu()
plot_grad_flow(t_eval.named_parameters())
t_eval.to(device)
opts[0].step()
# Compute eval metrics
p_rate = p_cls.sum() / n
n_rate = n_cls.sum() / n
# Garbage collection
gc.collect()
return [ {"label": metric_label, "name": "loss", "value":loss.cpu().item()},
{"label": metric_label, "name": "p_rate", "value":p_rate.cpu().item()},
{"label": metric_label, "name": "n_rate", "value":n_rate.cpu().item()}]
火车循环:
def fit(epochs, models, init_train, init_verify, train_step, verify_step, opts, train_dl, verify_dl, eval_dl, codebook, vis_epoch_step=10):
train_data = []
verify_data = []
eval_data = []
for epoch in tqdm(range(epochs)):
init_train(epoch, models)
i = 0
for data in train_dl:
train_metrics = train_step(epoch, models, data, opts=opts, codebook=codebook, show=epoch % vis_epoch_step == 0 and i == 0)
train_data.append(train_metrics)
i = i + 1
n = len(train_dl)
p = round((i/n)*100)
if p>0:
sys.stdout.write('\r')
bar_len = round(p/5)
empty_len = round((100-p)/5)
sys.stdout.write("Train batch %d/%d [%s%s] %d%%" % (i, n, '#'*bar_len, '_'*empty_len, p))
sys.stdout.flush()
# verification step
init_verify(epoch, models)
with torch.no_grad():
i = 0
for data in verify_dl:
verify_metrics = verify_step(epoch, models, data, codebook=codebook, show=epoch % vis_epoch_step == 0 and i == 0)
verify_data.append(verify_metrics)
i = i + 1
n = len(verify_dl)
p = round((i/n)*100)
if p>0:
sys.stdout.write('\r')
bar_len = round(p/5)
empty_len = round((100-p)/5)
sys.stdout.write("Verification batch %d/%d [%s%s] %d%%" % (i, n, '#'*bar_len, '_'*empty_len, p))
sys.stdout.flush()
...
正模板面内旋转作为负模板的结果
t_img_rand = torch.stack([rotate_image_tensor(y,np.random.random()*360) for y in t_img])
训练(p/n_rate 是平均 pos/neg 案例分类):
案例示例:
p_cls:[0.998]
n_cls:[0.000]
使用随机模板作为负模板(相同的模型初始化、优化器和超参数):
cb_id_rand = np.random.choice(codebook["size"],n)
t_img_rand = torch.stack([cb_get_img(i,codebook) for i in cb_id_rand]).to(device)
训练(p/n_rate 是平均 pos/neg 案例分类):
案例示例:
p_cls:[0.001]
n_cls:[0.998]
我想我发现了问题。问题在于,生成正模板的平面内旋转会创建不属于初始离散模板集的一部分的模板。该模型似乎会记住初始模板集,并且当给定模板似乎不属于其中时,仅给出 0.0 作为输出。
不幸的是,“良好”的表现只是一种过度拟合。 “糟糕”的表现是诚实的表现。