在CNN培训中略过反向传播,以加快Caffe的培训

问题描述 投票:0回答:1

我的目的是使用以前训练过的模型中的权重加速训练过程到新模型训练中的类似层。

说我有两个型号第一个型号和第二个型号。

第一个模型是从头开始训练。

第二个模型有两个额外的层(conv_1_1 and relu_1_1)与第一个模型的差异。其余层是相同的。

因此,我喜欢仅向第一模型反向传播它们是不同层的那些层,并且不再训练与第一模型相同的层。然后意图是加快训练第二个模型。

为此,我所做的是将lr_mult and decay_mult设置为0。

但是发现第二个模型的训练时间仍然和第一个模型的训练时间一样长。

我认为它们仍然乘以0,所以即使权重没有更新,计算仍然存在。

我怎样才能在同一层中跳过backpropagation

我检查了日志文件,发现为

I0830 13:40:26.546422 10580 net.cpp:226] mbox_loss needs backward computation.
I0830 13:40:26.546432 10580 net.cpp:228] mbox_priorbox does not need backward computation.
I0830 13:40:26.546437 10580 net.cpp:226] mbox_conf needs backward computation.
I0830 13:40:26.546440 10580 net.cpp:226] mbox_loc needs backward computation.
I0830 13:40:26.546444 10580 net.cpp:228] conv_6_norm_mbox_priorbox does not need backward computation.
I0830 13:40:26.546448 10580 net.cpp:226] conv_6_norm_mbox_conf_flat needs backward computation.
I0830 13:40:26.546452 10580 net.cpp:226] conv_6_norm_mbox_conf_perm needs backward computation.
I0830 13:40:26.546455 10580 net.cpp:226] conv_6_norm_mbox_conf needs backward computation.
I0830 13:40:26.546460 10580 net.cpp:226] conv_6_norm_mbox_loc_flat needs backward computation.
I0830 13:40:26.546464 10580 net.cpp:226] conv_6_norm_mbox_loc_perm needs backward computation.
I0830 13:40:26.546468 10580 net.cpp:226] conv_6_norm_mbox_loc needs backward computation.
I0830 13:40:26.546471 10580 net.cpp:226] conv_6_norm_conv_6_norm_0_split needs backward computation.
I0830 13:40:26.546475 10580 net.cpp:226] conv_6_norm needs backward computation.
I0830 13:40:26.546478 10580 net.cpp:226] relu_6 needs backward computation.
I0830 13:40:26.546481 10580 net.cpp:226] conv_6 needs backward computation.
I0830 13:40:26.546485 10580 net.cpp:226] pool_5 needs backward computation.
I0830 13:40:26.546489 10580 net.cpp:226] relu_5 needs backward computation.
I0830 13:40:26.546492 10580 net.cpp:226] conv_5 needs backward computation.
I0830 13:40:26.546495 10580 net.cpp:226] pool_4 needs backward computation.
I0830 13:40:26.546499 10580 net.cpp:226] relu_4 needs backward computation.
I0830 13:40:26.546502 10580 net.cpp:226] conv_4 needs backward computation.
I0830 13:40:26.546505 10580 net.cpp:226] pool_3 needs backward computation.
I0830 13:40:26.546509 10580 net.cpp:226] relu_3 needs backward computation.
I0830 13:40:26.546512 10580 net.cpp:226] conv_3 needs backward computation.
I0830 13:40:26.546515 10580 net.cpp:226] pool_2 needs backward computation.
I0830 13:40:26.546519 10580 net.cpp:226] relu_2 needs backward computation.
I0830 13:40:26.546522 10580 net.cpp:226] conv_2 needs backward computation.
I0830 13:40:26.546525 10580 net.cpp:226] pool_1 needs backward computation.
I0830 13:40:26.546530 10580 net.cpp:226] relu_1_1 needs backward computation.
I0830 13:40:26.546532 10580 net.cpp:226] conv_1_1 needs backward computation.
I0830 13:40:26.546536 10580 net.cpp:228] relu_1 does not need backward computation.
I0830 13:40:26.546540 10580 net.cpp:228] conv_1 does not need backward computation.
I0830 13:40:26.546545 10580 net.cpp:228] data_data_0_split does not need backward computation.
I0830 13:40:26.546548 10580 net.cpp:228] data does not need backward computation.

所以只有conv_1 and relu_1没有回传。但是其他层仍然会被传播。

如何在以下图层中关闭反向传播

conv_6_norm_conv_6_norm_0_split, conv_6_norm, relu_6, conv_6, pool_5, relu_5, conv_5, pool_4, relu_4, conv_4,pool_3, relu_3, conv_3, pool_2, relu_2, conv_2 

Train.prototxt文件如下。

name: "RegNet_train_0"
layer {
  name: "data"
  type: "AnnotatedData"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mirror: true
    mean_value: 104.0
    mean_value: 117.0
    mean_value: 123.0
    resize_param {
      prob: 1.0
      resize_mode: WARP
      height: 480
      width: 480
      interp_mode: LINEAR
      interp_mode: AREA
      interp_mode: NEAREST
      interp_mode: CUBIC
      interp_mode: LANCZOS4
      height_scale: 480
      width_scale: 480
    }
    emit_constraint {
      emit_type: CENTER
    }
    distort_param {
      brightness_prob: 0.5
      brightness_delta: 32.0
      contrast_prob: 0.5
      contrast_lower: 0.5
      contrast_upper: 1.5
      hue_prob: 0.5
      hue_delta: 18.0
      saturation_prob: 0.5
      saturation_lower: 0.5
      saturation_upper: 1.5
      random_order_prob: 0.0
    }
    expand_param {
      prob: 0.5
      max_expand_ratio: 4.0
    }
  }
  data_param {
    source: "/home/coie/data/NumberPlate/lmdb/Nextan_trainval_lmdb"
    batch_size: 16
    backend: LMDB
  }
  annotated_data_param {
    batch_sampler {
      max_sample: 1
      max_trials: 1
    }
    batch_sampler {
      sampler {
        min_scale: 0.300000011921
        max_scale: 1.0
        min_aspect_ratio: 0.5
        max_aspect_ratio: 2.0
      }
      sample_constraint {
        min_jaccard_overlap: 0.10000000149
      }
      max_sample: 1
      max_trials: 50
    }
    batch_sampler {
      sampler {
        min_scale: 0.300000011921
        max_scale: 1.0
        min_aspect_ratio: 0.5
        max_aspect_ratio: 2.0
      }
      sample_constraint {
        min_jaccard_overlap: 0.300000011921
      }
      max_sample: 1
      max_trials: 50
    }
    batch_sampler {
      sampler {
        min_scale: 0.300000011921
        max_scale: 1.0
        min_aspect_ratio: 0.5
        max_aspect_ratio: 2.0
      }
      sample_constraint {
        min_jaccard_overlap: 0.5
      }
      max_sample: 1
      max_trials: 50
    }
    batch_sampler {
      sampler {
        min_scale: 0.300000011921
        max_scale: 1.0
        min_aspect_ratio: 0.5
        max_aspect_ratio: 2.0
      }
      sample_constraint {
        min_jaccard_overlap: 0.699999988079
      }
      max_sample: 1
      max_trials: 50
    }
    batch_sampler {
      sampler {
        min_scale: 0.300000011921
        max_scale: 1.0
        min_aspect_ratio: 0.5
        max_aspect_ratio: 2.0
      }
      sample_constraint {
        min_jaccard_overlap: 0.899999976158
      }
      max_sample: 1
      max_trials: 50
    }
    batch_sampler {
      sampler {
        min_scale: 0.300000011921
        max_scale: 1.0
        min_aspect_ratio: 0.5
        max_aspect_ratio: 2.0
      }
      sample_constraint {
        max_jaccard_overlap: 1.0
      }
      max_sample: 1
      max_trials: 50
    }
    label_map_file: "/home/coie/data/NumberPlate/labelmap_NumberPlate.prototxt"
  }
}
layer {
  name: "conv_1"
  type: "Convolution"
  bottom: "data"
  top: "conv_1"
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 8
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "relu_1"
  type: "ReLU"
  bottom: "conv_1"
  top: "conv_1"
}
layer {
  name: "conv_1_1"
  type: "Convolution"
  bottom: "conv_1"
  top: "conv_1_1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 8
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "relu_1_1"
  type: "ReLU"
  bottom: "conv_1_1"
  top: "conv_1_1"
}
layer {
  name: "pool_1"
  type: "Pooling"
  bottom: "conv_1_1"
  top: "pool_1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv_2"
  type: "Convolution"
  bottom: "pool_1"
  top: "conv_2"
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 8
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "relu_2"
  type: "ReLU"
  bottom: "conv_2"
  top: "conv_2"
}
layer {
  name: "pool_2"
  type: "Pooling"
  bottom: "conv_2"
  top: "pool_2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv_3"
  type: "Convolution"
  bottom: "pool_2"
  top: "conv_3"
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 16
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "relu_3"
  type: "ReLU"
  bottom: "conv_3"
  top: "conv_3"
}
layer {
  name: "pool_3"
  type: "Pooling"
  bottom: "conv_3"
  top: "pool_3"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv_4"
  type: "Convolution"
  bottom: "pool_3"
  top: "conv_4"
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 16
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "relu_4"
  type: "ReLU"
  bottom: "conv_4"
  top: "conv_4"
}
layer {
  name: "pool_4"
  type: "Pooling"
  bottom: "conv_4"
  top: "pool_4"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv_5"
  type: "Convolution"
  bottom: "pool_4"
  top: "conv_5"
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 32
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "relu_5"
  type: "ReLU"
  bottom: "conv_5"
  top: "conv_5"
}
layer {
  name: "pool_5"
  type: "Pooling"
  bottom: "conv_5"
  top: "pool_5"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv_6"
  type: "Convolution"
  bottom: "pool_5"
  top: "conv_6"
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 32
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "relu_6"
  type: "ReLU"
  bottom: "conv_6"
  top: "conv_6"
}
layer {
  name: "conv_6_norm"
  type: "Normalize"
  bottom: "conv_6"
  top: "conv_6_norm"
  norm_param {
    across_spatial: false
    scale_filler {
      type: "constant"
      value: 20.0
    }
    channel_shared: false
  }
}
layer {
  name: "conv_6_norm_mbox_loc"
  type: "Convolution"
  bottom: "conv_6_norm"
  top: "conv_6_norm_mbox_loc"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 12
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "conv_6_norm_mbox_loc_perm"
  type: "Permute"
  bottom: "conv_6_norm_mbox_loc"
  top: "conv_6_norm_mbox_loc_perm"
  permute_param {
    order: 0
    order: 2
    order: 3
    order: 1
  }
}
layer {
  name: "conv_6_norm_mbox_loc_flat"
  type: "Flatten"
  bottom: "conv_6_norm_mbox_loc_perm"
  top: "conv_6_norm_mbox_loc_flat"
  flatten_param {
    axis: 1
  }
}
layer {
  name: "conv_6_norm_mbox_conf"
  type: "Convolution"
  bottom: "conv_6_norm"
  top: "conv_6_norm_mbox_conf"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 6
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "conv_6_norm_mbox_conf_perm"
  type: "Permute"
  bottom: "conv_6_norm_mbox_conf"
  top: "conv_6_norm_mbox_conf_perm"
  permute_param {
    order: 0
    order: 2
    order: 3
    order: 1
  }
}
layer {
  name: "conv_6_norm_mbox_conf_flat"
  type: "Flatten"
  bottom: "conv_6_norm_mbox_conf_perm"
  top: "conv_6_norm_mbox_conf_flat"
  flatten_param {
    axis: 1
  }
}
layer {
  name: "conv_6_norm_mbox_priorbox"
  type: "PriorBox"
  bottom: "conv_6_norm"
  bottom: "data"
  top: "conv_6_norm_mbox_priorbox"
  prior_box_param {
    min_size: 25.6000003815
    max_size: 48.0
    aspect_ratio: 3.0
    flip: false
    clip: false
    variance: 0.10000000149
    variance: 0.10000000149
    variance: 0.20000000298
    variance: 0.20000000298
    img_size: 480
    step: 32.0
    offset: 0.5
  }
}
layer {
  name: "mbox_loc"
  type: "Concat"
  bottom: "conv_6_norm_mbox_loc_flat"
  top: "mbox_loc"
  concat_param {
    axis: 1
  }
}
layer {
  name: "mbox_conf"
  type: "Concat"
  bottom: "conv_6_norm_mbox_conf_flat"
  top: "mbox_conf"
  concat_param {
    axis: 1
  }
}
layer {
  name: "mbox_priorbox"
  type: "Concat"
  bottom: "conv_6_norm_mbox_priorbox"
  top: "mbox_priorbox"
  concat_param {
    axis: 2
  }
}
layer {
  name: "mbox_loss"
  type: "MultiBoxLoss"
  bottom: "mbox_loc"
  bottom: "mbox_conf"
  bottom: "mbox_priorbox"
  bottom: "label"
  top: "mbox_loss"
  include {
    phase: TRAIN
  }
  propagate_down: true
  propagate_down: true
  propagate_down: false
  propagate_down: false
  loss_param {
    normalization: VALID
  }
  multibox_loss_param {
    loc_loss_type: SMOOTH_L1
    conf_loss_type: SOFTMAX
    loc_weight: 1.0
    num_classes: 2
    share_location: true
    match_type: PER_PREDICTION
    overlap_threshold: 0.5
    use_prior_for_matching: true
    background_label_id: 1
    use_difficult_gt: true
    neg_pos_ratio: 3.0
    neg_overlap: 0.5
    code_type: CENTER_SIZE
    ignore_cross_boundary_bbox: false
    mining_type: MAX_NEGATIVE
  }
}
deep-learning conv-neural-network caffe
1个回答
0
投票

Caffe图书馆有param_need_backward, layer_need_backward_ and blob_need_backward_

param_need_backward控制权重更新,如果lr_mult设置为0,则param_need_backward数组中该层的数组元素为false。然后在Conv_layer.cpp中不更新内部权重,因为param_propagate_down_为false。

if (this->param_propagate_down_[0] || propagate_down[i]) {
      for (int n = 0; n < this->num_; ++n) {
        // gradient w.r.t. weight. Note that we will accumulate diffs.
        if (this->param_propagate_down_[0]) {
          this->weight_cpu_gemm(bottom_data + n * this->bottom_dim_,
              top_diff + n * this->top_dim_, weight_diff);
        }
        // gradient w.r.t. bottom data, if necessary.
        if (propagate_down[i]) {
          this->backward_cpu_gemm(top_diff + n * this->top_dim_, weight,
              bottom_diff + n * this->bottom_dim_);
        }
      }
    }

layer_need_backward_和blob_need_backward_用于图层,根据网络架构在Net init中控制它们。

© www.soinside.com 2019 - 2024. All rights reserved.