我正在尝试更改代码(https://github.com/Microsoft/singleshotpose) 它是一个 CNN,它使用 9 个关键点,8 个来自边界框和一个质心来训练网络寻找这些点。我想尝试仅在 4 个关键点而不是原始代码中的 9 个关键点上训练图像。我试图一步一步地更改代码,但现在我遇到了一个我无法解决的错误。
我将给出损失函数的代码片段:
def build_targets(pred_corners, target, num_keypoints, num_anchors, num_classes, nH, nW, noobject_scale, object_scale, sil_thresh, seen):
nB = target.size(0)
nA = num_anchors
nC = num_classes
conf_mask = torch.ones(nB, nA, nH, nW) * noobject_scale
coord_mask = torch.zeros(nB, nA, nH, nW)
cls_mask = torch.zeros(nB, nA, nH, nW)
txs = list()
tys = list()
for i in range(num_keypoints):
txs.append(torch.zeros(nB, nA, nH, nW))
tys.append(torch.zeros(nB, nA, nH, nW))
tconf = torch.zeros(nB, nA, nH, nW)
tcls = torch.zeros(nB, nA, nH, nW)
num_labels = 2 * num_keypoints + 3 # +2 for width, height and +1 for class within label files
nAnchors = nA*nH*nW
nPixels = nH*nW
for b in range(nB):
cur_pred_corners = pred_corners[b*nAnchors:(b+1)*nAnchors].t()
cur_confs = torch.zeros(nAnchors)
for t in range(50):
if target[b][t*num_labels+1] == 0:
break
g = list()
for i in range(num_keypoints):
g.append(target[b][t*num_labels+2*i+1])
g.append(target[b][t*num_labels+2*i+2])
cur_gt_corners = torch.FloatTensor(g).repeat(nAnchors,1).t() # 16 x nAnchors
cur_confs = torch.max(cur_confs, corner_confidences(cur_pred_corners, cur_gt_corners)).view_as(conf_mask[b]) # some irrelevant areas are filtered, in the same grid multiple anchor boxes might exceed the threshold
conf_mask[b][cur_confs>sil_thresh] = 0
我得到的错误如下:
File "c:\python_work\singleshotpose\region_loss2.py", line 56, in build_targets
cur_confs = torch.max(cur_confs, corner_confidences(cur_pred_corners, cur_gt_corners)).view_as(conf_mask[b]) # some irrelevant areas are filtered, in the same grid multiple anchor boxes might exceed the threshold
RuntimeError: The size of tensor a (13) must match the size of tensor b (169) at non-singleton dimension 2
它告诉我张量的大小不同。据我所知,我正在使用 corner_confidences 函数来查找预测角点和真实角点之间的距离。之后它返回初始化置信度和那些新置信度之间的最大值。当 .view_as 被遗漏时,代码在第一次迭代时工作,并在与 sil_thresh 比较时在第二次停止。
这些是我打印 gt_corners、pr_corners 和 conf_mask 的 .size() 时的结果 此外,我还添加了其他变量的大小,以便更好地理解:
nB = 8, nA = 1, nC = 1, nH = 13, nW = 13, num_keypoints = 4
cur_pred_corners=torch.Size([8, 169])
cur_gt_corners=torch.Size([8, 169])
conf_mask=torch.Size([8, 1, 13, 13])
另外,我将添加 corners_confidences 函数,以便更加清晰
def corner_confidences(gt_corners, pr_corners, th=80, sharpness=2, im_width=640, im_height=480):
''' gt_corners: Ground-truth 2D projections of the 3D bounding box corners, shape: (16 x nA), type: torch.FloatTensor
pr_corners: Prediction for the 2D projections of the 3D bounding box corners, shape: (16 x nA), type: torch.FloatTensor
th : distance threshold, type: int
sharpness : sharpness of the exponential that assigns a confidence value to the distance
-----------
return : a torch.FloatTensor of shape (nA,) with 9 confidence values
'''
print(f"gt_corners: {gt_corners.size()}")
print(f"pr_corners: {pr_corners.size()}")
shape = gt_corners.size()
torch.set_printoptions(threshold=sys.maxsize)
print(f"test: {shape}")
nA = shape[1]
dist = gt_corners - pr_corners
num_el = dist.numel()
num_keypoints = num_el//(nA*2)
dist = dist.t().contiguous().view(nA, num_keypoints, 2)
dist[:, :, 0] = dist[:, :, 0] * im_width
dist[:, :, 1] = dist[:, :, 1] * im_height
eps = 1e-5
distthresh = torch.FloatTensor([th]).repeat(nA, num_keypoints)
dist = torch.sqrt(torch.sum((dist)**2, dim=2)).squeeze() # nA x 9
mask = (dist < distthresh).type(torch.FloatTensor)
conf = torch.exp(sharpness*(1 - dist/distthresh))-1 # mask * (torch.exp(math.log(2) * (1.0 - dist/rrt)) - 1)
conf0 = torch.exp(sharpness*(1 - torch.zeros(conf.size(0),1))) - 1
conf = conf / conf0.repeat(1, num_keypoints)
# conf = 1 - dist/distthresh
conf = mask * conf # nA x 9
mean_conf = torch.mean(conf, dim=1)
return mean_conf
我现在有这个错误几天了,只是找不到解决方案,希望有人能帮助我! 谢谢大家