用于物体方向估计的模板匹配模型仅在平面内旋转时快速收敛，但在全 3D 方向时失败

Question

背景

我正在试验一个模型，该模型应该将已知对象的查询图像与方向可能相同的相应模板图像进行匹配。（我将处理对称对象和严重遮挡，因此这种关系通常是一对多。）

我给模型一个图像对作为输入（查询图像+候选模板图像），如果模型认为对象没有相同的方向，我期望 0.0；如果模型认为它们确实具有相同的方向，我期望 1.0。（我使用 L1_loss 进行训练。）

我用合成数据批量训练这个模型，其中我为每个查询图像提供：

正例：具有正确关联模板图像的查询图像（期望分类为 1.0），
和“负”情况：相同的查询图像，但具有随机模板图像（预期分类为 0.0）。

问题

奇怪的是，当负模板是正模板的平面内旋转时，模型的训练和表现几乎令人怀疑（正例的平均分类 = ~.99，负例 = ~.1）。但是，当负模板是完全随机的模板且具有任何 3D 对象方向时，模型会遇到很大困难（正例的平均分类 = ~.75，负例 = ~.5）。这对我来说似乎很奇怪，因为 pos 之间应该有更多差异。和否定。因此应该更容易区分他们。

代码

型号：

class TemplateEvaluator(nn.Module):
    def __init__(self, q_encoder=resnet18(weights=ResNet18_Weights.IMAGENET1K_V1), t_encoder=resnet18(weights=ResNet18_Weights.IMAGENET1K_V1)):
        super(TemplateEvaluator, self).__init__()
        self.q_encoder = q_encoder
        self.t_encoder = t_encoder
        
        self.fc = nn.Sequential(
            nn.Linear(2000, 1),
            nn.Sigmoid()
        )
    
    def forward(self, data):
        q = data[0]
        t = data[1]
        q = self.q_encoder(q)
        t = self.t_encoder(t)
        res = self.fc(torch.cat([q,t],-1))
        return res

训练步骤：

cb_id 包含相关正确模板的 ID（具有最小角度差的模板）
t_img_rand 是负模板

def template_eval_train_step(iteration, models, data, codebook, opts=None, show=False, metric_label=''):
    # Get query image, associated codebook template ID, and associated orientation
    q_img, cb_id, rot = data
    n = q_img.shape[0]
    t_eval = models[0]
    
    # Get random template IDs
    cb_id_rand = np.random.choice(codebook["size"],n)

    # Get associated and random template images
    t_img = torch.stack([cb_get_img(i,codebook) for i in cb_id]).to(device)

    # Uncomment to use random template as neg cases
    t_img_rand = torch.stack([cb_get_img(i,codebook) for i in cb_id_rand]).to(device)
    # Uncomment to use in-plane rotations of pos template as neg cases
    # t_img_rand = torch.stack([rotate_image_tensor(y,np.random.random()*360) for y in t_img])
    
    # Cases with similar template image ('Positive')
    p_cases = torch.stack([q_img.permute(0, 3, 1, 2),t_img.permute(0, 3, 1, 2)])

    # Cases with random template image ('Negative')
    n_cases = torch.stack([q_img.permute(0, 3, 1, 2),t_img_rand.permute(0, 3, 1, 2)])

    # Mix together for 50/50 distribution in batch
    mixed_cases = torch.concat([p_cases,n_cases], 1)

    # Run model
    c = t_eval(mixed_cases)

    # Get classification for pos and neg cases
    p_cls = c[:n]
    n_cls = c[n:]

    # Compute loss
    p_loss = F.l1_loss(p_cls, torch.ones_like(p_cls, requires_grad=True))
    n_loss = F.l1_loss(n_cls, torch.zeros_like(n_cls, requires_grad=True))
    loss = (p_loss + n_loss)/2
    
    # Visualise pos and neg case at i=0
    if show:
        i=0
        view([q_img[i].detach().cpu().numpy(), t_img[i].detach().cpu().numpy()])
        print("p_cls:",p_cls[i].detach().cpu().numpy())
        view([q_img[i].detach().cpu().numpy(), t_img_rand[i].detach().cpu().numpy()])
        print("n_cls:",n_cls[i].detach().cpu().numpy())

    # Run optimizer (if given)
    if opts is not None:
        opts[0].zero_grad()
        loss.backward()
        
        # Print gradient info
        if show:
            t_eval.cpu()
            plot_grad_flow(t_eval.named_parameters())
            t_eval.to(device)
        
        opts[0].step()

    # Compute eval metrics
    p_rate = p_cls.sum() / n
    n_rate = n_cls.sum() / n

    # Garbage collection 
    gc.collect()

    return [ {"label": metric_label, "name": "loss", "value":loss.cpu().item()},
             {"label": metric_label, "name": "p_rate", "value":p_rate.cpu().item()},
             {"label": metric_label, "name": "n_rate", "value":n_rate.cpu().item()}]

火车循环：

init_train、init_verify 只是将模型置于训练或评估模式
train_step 是之前的函数

def fit(epochs, models, init_train, init_verify, train_step, verify_step, opts, train_dl, verify_dl, eval_dl, codebook, vis_epoch_step=10):
    train_data = []
    verify_data = []
    eval_data = []

    for epoch in tqdm(range(epochs)):
        init_train(epoch, models)
        
        i = 0
        for data in train_dl:
            train_metrics = train_step(epoch, models, data, opts=opts, codebook=codebook, show=epoch % vis_epoch_step == 0 and i == 0)
            train_data.append(train_metrics)
            i = i + 1
            
            n = len(train_dl)
            p = round((i/n)*100)
            if p>0:
                sys.stdout.write('\r')
                bar_len = round(p/5)
                empty_len = round((100-p)/5)
                sys.stdout.write("Train batch %d/%d [%s%s] %d%%" % (i, n, '#'*bar_len, '_'*empty_len, p))
                sys.stdout.flush()
            
        # verification step
        init_verify(epoch, models)
        with torch.no_grad():
            
            i = 0
            for data in verify_dl:
                verify_metrics = verify_step(epoch, models, data, codebook=codebook, show=epoch % vis_epoch_step == 0 and i == 0)
                verify_data.append(verify_metrics)
                i = i + 1
            
                n = len(verify_dl)
                p = round((i/n)*100)
                if p>0:
                    sys.stdout.write('\r')
                    bar_len = round(p/5)
                    empty_len = round((100-p)/5)
                    sys.stdout.write("Verification batch %d/%d [%s%s] %d%%" % (i, n, '#'*bar_len, '_'*empty_len, p))
                    sys.stdout.flush()
...

正模板面内旋转作为负模板的结果

t_img_rand = torch.stack([rotate_image_tensor(y,np.random.random()*360) for y in t_img])

训练（p/n_rate 是平均 pos/neg 案例分类）：

案例示例：

p_cls：[0.998]

n_cls：[0.000]

使用随机模板作为负模板（相同的模型初始化、优化器和超参数）：

cb_id_rand = np.random.choice(codebook["size"],n)
t_img_rand = torch.stack([cb_get_img(i,codebook) for i in cb_id_rand]).to(device)

训练（p/n_rate 是平均 pos/neg 案例分类）：

案例示例：

p_cls：[0.001]

n_cls：[0.998]

Answer 1

我想我发现了问题。问题在于，生成正模板的平面内旋转会创建不属于初始离散模板集的一部分的模板。该模型似乎会记住初始模板集，并且当给定模板似乎不属于其中时，仅给出 0.0 作为输出。

不幸的是，“良好”的表现只是一种过度拟合。 “糟糕”的表现是诚实的表现。

用于物体方向估计的模板匹配模型仅在平面内旋转时快速收敛，但在全 3D 方向时失败

问题描述投票：0回答：1

1个回答

最新问题

用于物体方向估计的模板匹配模型仅在平面内旋转时快速收敛，但在全 3D 方向时失败

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1