推理时间在使用Torch的不同GPU上有所不同

Question

运行以下推理代码时出现错误。在函数accept（）中，需要0.4秒才能完成预测。将结果preds_str返回调用方函数还需要3秒钟的时间。我发现如果在文件配置中设置gpu_id=0，它会立即返回。如何解决此错误？预先感谢。

def recognize(imgs, model, demo_loader):

          t = time()
          model.eval()
          with torch.no_grad():
              for image_tensors, image_path_list in demo_loader:
                    batch_size = image_tensors.size(0)
                    image = image_tensors.to(config.device)
                    # For max length prediction
                    length_for_pred = torch.IntTensor([config.batch_max_length] * batch_size).to(config.device)
                    text_for_pred = torch.LongTensor(batch_size, config.batch_max_length + 1).fill_(0).to(config.device)

                    preds = model(image, text_for_pred, is_train=False)
                    _, preds_index = preds.max(2)
                    preds_str = converter.decode(preds_index, length_for_pred)

           print('time elapsed before return:'time()-t) #0.4s
           return preds_str
def main():                        
     model = Model()
     self.model.cuda(config.device)
     model = torch.nn.DataParallel(model, device_ids=[config.device], output_device=[config.device]).to(config.device)
     model.load_state_dict(torch.load(config.saved_model, map_location=config.device))
     AlignCollate_demo = AlignCollate(imgH=config.imgH, imgW=config.imgW, keep_ratio_with_pad=config.PAD)
     imgs_dataset = ImageDataset(imgs)
     demo_loader = torch.utils.data.DataLoader(imgs_dataset, batch_size=config.batch_size,shuffle=False,num_workers=int(config.workers),collate_fn=AlignCollate_demo, pin_memory=True)     
     start_time = time()
     # imgs = [img1, img2, ....]
     preds_str = recognize(imgs, model, demo_loader)
     print('time elapsed after return', time()-start_time) #3.4s

配置文件：

    class ConfigWordRecognizer:
        gpu_id = 1 #troublesome line here
        device = torch.device('cuda:{}'.format(gpu_id) if torch.cuda.is_available() else 'cpu')
        imgH = 32
        imgW = 100
        batch_size = 80
        workers = 8
        batch_max_length = 25

Answer 1

我从此post找到了解决方案。我设置了CUDA_VISIBLE_DEVICES=1，gpu_id=0。然后，我更改

model.load_state_dict(torch.load(config.saved_model, map_location=config.device))

to

self.model.load_state_dict(self.copyStateDict(torch.load(self.config.saved_model, map_location=self.config.device)))

复制stateDict函数：

def copyStateDict(self, state_dict):
        if list(state_dict.keys())[0].startswith("module"):
            start_idx = 1
        else:
            start_idx = 0
        new_state_dict = OrderedDict()
        for k, v in state_dict.items():
            name = ".".join(k.split(".")[start_idx:])
            new_state_dict[name] = v
        return new_state_dict

模型在gpu1上运行良好。但是我仍然不明白为什么如果我设置'gpu_id = 0'，它在没有gpu0的copyStateDict上也能很好地工作

推理时间在使用Torch的不同GPU上有所不同

问题描述投票：1回答：1

1个回答

最新问题

推理时间在使用Torch的不同GPU上有所不同

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1