vivado hls循环展开是顺序的

问题描述 投票:0回答:1

我有一个完全连接的图层功能,我想在vivado HLS中并行化。

如下面的代码所示,我关注的循环是'input_loop:',我已经设置了一个指令以展开16倍.Vivado HLS确实正在展开它,我可以看到创建了16个乘数,但是它展开了顺序:enter image description here

我错过了什么?我希望这个循环重复16次。

代码如下:

#include <algorithm>
#include "fc_layer.h"
#include <stdio.h>
#include <string.h>
typedef ap_fixed<64,32,AP_RND> db_word_fixedpt;
typedef ap_fixed<32,16,AP_RND> word_fixedpt;
typedef ap_fixed<16,8,AP_RND> small_fixedpt;
void fc_layer(small_fixedpt weights[MAX_INPUT_SIZE*MAX_OUTPUT_SIZE],
        small_fixedpt biases[MAX_OUTPUT_SIZE],
        small_fixedpt input[MAX_INPUT_SIZE*MAX_BATCH],
              small_fixedpt output[MAX_OUTPUT_SIZE*MAX_BATCH],
              const int batch_size,
              const int num_inputs,
              const int num_outputs)
{
  // Batch Iterator
  for (int b = 0; b < batch_size; b++) {
  #pragma HLS loop_tripcount min=1 max=10
    // Output Node Iterator
    array_cpy:for (int o = 0; o < num_outputs; o++) {
    #pragma HLS loop_tripcount min=1 max=1024
      // Set bias
        small_fixedpt output_fixp = 0;
      //output_fixp = biases[o];
      //float input_sub_array[1024] = input[o*num_inputs:o*num_inputs+1024];
      small_fixedpt input_sub_array[1024] = {0};
      small_fixedpt weight_sub_array[1024] = {0};
      small_fixedpt output_sub_array[1024] = {0};
      small_fixedpt output_sub_array_stg2[64] = {0};
      subcopy:for(int i = 0; i < 1024; i++) {
          input_sub_array[i] = input[b*num_inputs+i];
          weight_sub_array[i] = weights[o*num_inputs+i];
      }
      // Accumulate weighted sum
      input_loop:for (int i = 0; i < std::min(num_inputs,MAX_INPUT_SIZE); i++) {
      #pragma HLS loop_tripcount min=1 max=1024
          output_sub_array[i] = input_sub_array[i]*weights[i];
      }
      output[b*num_outputs+o] = biases[o];
      for(int i = 0; i < 64; i++) {
          output_sub_array_stg2[i] = output_sub_array[16*i] + output_sub_array[16*i+1] \
                                   + output_sub_array[16*i+2] + output_sub_array[16*i+3] \
                                   + output_sub_array[16*i+4] + output_sub_array[16*i+5] \
                                   + output_sub_array[16*i+6] + output_sub_array[16*i+7] \
                                   + output_sub_array[16*i+8] + output_sub_array[16*i+9] \
                                   + output_sub_array[16*i+10] + output_sub_array[16*i+11] \
                                   + output_sub_array[16*i+12] + output_sub_array[16*i+13] \
                                   + output_sub_array[16*i+14] + output_sub_array[16*i+15];
      }
      for(int i = 0; i < 64; i++) {
          output[b*num_outputs+o] += output_sub_array_stg2[i];
      }
    }
  }
}

指令文件:

    ############################################################
## This file is generated automatically by Vivado HLS.
## Please DO NOT edit it.
## Copyright (C) 1986-2017 Xilinx, Inc. All Rights Reserved.
############################################################
set_directive_unroll -factor 16 "fc_layer/input_loop"
set_directive_array_partition -type cyclic -factor 16 -dim 1 "fc_layer" input
set_directive_array_partition -type cyclic -factor 16 -dim 1 "fc_layer" weights
set_directive_array_partition -type cyclic -factor 16 -dim 1 "fc_layer/array_cpy" input_sub_array
set_directive_array_partition -type cyclic -factor 16 -dim 1 "fc_layer/array_cpy" weight_sub_array
set_directive_unroll -factor 16 "fc_layer/subcopy"
set_directive_array_partition -type cyclic -factor 16 -dim 1 "fc_layer/array_cpy" output_sub_array
set_directive_resource -core RAM_S2P_LUTRAM "fc_layer/array_cpy" output_sub_array
set_directive_resource -core RAM_S2P_LUTRAM "fc_layer/array_cpy" weight_sub_array
set_directive_resource -core RAM_S2P_LUTRAM "fc_layer/array_cpy" input_sub_array
set_directive_resource -core RAM_T2P_BRAM "fc_layer" weights
set_directive_resource -core RAM_T2P_BRAM "fc_layer" biases
set_directive_resource -core RAM_T2P_BRAM "fc_layer" input

这是日志输出:

Starting C synthesis ...
/opt/Xilinx/Vivado_HLS/2017.2/bin/vivado_hls /workspace/REDACTED
INFO: [HLS 200-10] Running '/opt/Xilinx/Vivado_HLS/2017.2/bin/unwrapped/lnx64.o/vivado_hls'
INFO: [HLS 200-10] For user 'root' on host '3310c2d0e0d4' (Linux_x86_64 version 4.9.125-linuxkit) on Fri Feb 08 20:06:44 UTC 2019
INFO: [HLS 200-10] In directory REDACTED
INFO: [HLS 200-10] Opening project REDACTED
INFO: [HLS 200-10] Adding design file '../fc_test/fc_layer.cpp' to the project
INFO: [HLS 200-10] Adding test bench file '../fc_test/fc_layer_test.cpp' to the project
INFO: [HLS 200-10] Adding test bench file '../util/shared.cpp' to the project
INFO: [HLS 200-10] Adding test bench file '../nn_params' to the project
INFO: [HLS 200-10] Opening solution REDACTED
INFO: [SYN 201-201] Setting up clock 'default' with a period of 10ns.
INFO: [HLS 200-10] Setting target device to 'xcvu095-ffvc1517-2-e'
INFO: [HLS 200-10] Analyzing design file '../fc_test/fc_layer.cpp' ... 
INFO: [HLS 200-10] Validating synthesis directives ...
INFO: [HLS 200-111] Finished Checking Pragmas Time (s): cpu = 00:01:48 ; elapsed = 00:00:57 . Memory (MB): peak = 345.266 ; gain = 12.586 ; free physical = 3149 ; free virtual = 4919
INFO: [HLS 200-111] Finished Linking Time (s): cpu = 00:01:50 ; elapsed = 00:00:59 . Memory (MB): peak = 345.266 ; gain = 12.586 ; free physical = 3141 ; free virtual = 4918
INFO: [HLS 200-10] Starting code transformations ...
INFO: [XFORM 203-603] Inlining function 'std::min<int>' into 'fc_layer' (../fc_test/fc_layer.cpp:35).
INFO: [XFORM 203-603] Inlining function 'ap_fixed_base<16, 8, true, (ap_q_mode)0, (ap_o_mode)3, 0>::quantization_adjust' into 'ap_fixed_base<16, 8, true, (ap_q_mode)0, (ap_o_mode)3, 0>::ap_fixed_base<32, 16, true, (ap_q_mode)5, (ap_o_mode)3, 0>' ().
INFO: [HLS 200-111] Finished Standard Transforms Time (s): cpu = 00:01:52 ; elapsed = 00:01:01 . Memory (MB): peak = 346.039 ; gain = 13.359 ; free physical = 3118 ; free virtual = 4900
INFO: [HLS 200-10] Checking synthesizability ...
INFO: [HLS 200-111] Finished Checking Synthesizability Time (s): cpu = 00:01:52 ; elapsed = 00:01:02 . Memory (MB): peak = 473.895 ; gain = 141.215 ; free physical = 3107 ; free virtual = 4891
INFO: [XFORM 203-501] Unrolling loop 'subcopy' (../fc_test/fc_layer.cpp:30) in function 'fc_layer' partially with a factor of 16.
INFO: [XFORM 203-501] Unrolling loop 'input_loop' (../fc_test/fc_layer.cpp:35) in function 'fc_layer' partially with a factor of 16.
INFO: [XFORM 203-101] Partitioning array 'weights.V' (../fc_test/fc_layer.cpp:8) in dimension 1 with a cyclic factor 16.
INFO: [XFORM 203-101] Partitioning array 'input.V' (../fc_test/fc_layer.cpp:10) in dimension 1 with a cyclic factor 16.
INFO: [XFORM 203-101] Partitioning array 'input_sub_array.V' (../fc_test/fc_layer.cpp:26) in dimension 1 with a cyclic factor 16.
INFO: [XFORM 203-101] Partitioning array 'weight_sub_array.V' (../fc_test/fc_layer.cpp:27) in dimension 1 with a cyclic factor 16.
INFO: [XFORM 203-101] Partitioning array 'output_sub_array.V' (../fc_test/fc_layer.cpp:28) in dimension 1 with a cyclic factor 16.
INFO: [XFORM 203-11] Balancing expressions in function 'fc_layer' (../fc_test/fc_layer.cpp:8)...15 expression(s) balanced.
INFO: [HLS 200-111] Finished Pre-synthesis Time (s): cpu = 00:01:55 ; elapsed = 00:01:05 . Memory (MB): peak = 473.895 ; gain = 141.215 ; free physical = 3069 ; free virtual = 4859
INFO: [HLS 200-111] Finished Architecture Synthesis Time (s): cpu = 00:01:59 ; elapsed = 00:01:09 . Memory (MB): peak = 473.895 ; gain = 141.215 ; free physical = 3063 ; free virtual = 4855
INFO: [HLS 200-10] Starting hardware synthesis ...
INFO: [HLS 200-10] Synthesizing 'fc_layer' ...
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [HLS 200-10] -- Implementing module 'fc_layer' 
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [SCHED 204-11] Starting scheduling ...
INFO: [SCHED 204-11] Finished scheduling.
INFO: [HLS 200-111]  Elapsed time: 73.78 seconds; current allocated memory: 108.265 MB.
INFO: [BIND 205-100] Starting micro-architecture generation ...
INFO: [BIND 205-101] Performing variable lifetime analysis.
INFO: [BIND 205-101] Exploring resource sharing.
INFO: [BIND 205-101] Binding ...
INFO: [BIND 205-100] Finished micro-architecture generation.
INFO: [HLS 200-111]  Elapsed time: 3.42 seconds; current allocated memory: 112.243 MB.
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [HLS 200-10] -- Generating RTL for module 'fc_layer' 
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/weights_0_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/weights_1_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/weights_2_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/weights_3_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/weights_4_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/weights_5_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/weights_6_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/weights_7_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/weights_8_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/weights_9_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/weights_10_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/weights_11_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/weights_12_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/weights_13_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/weights_14_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/weights_15_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/biases_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/input_0_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/input_1_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/input_2_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/input_3_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/input_4_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/input_5_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/input_6_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/input_7_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/input_8_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/input_9_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/input_10_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/input_11_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/input_12_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/input_13_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/input_14_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/input_15_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/output_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/batch_size' to 'ap_none'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/num_inputs' to 'ap_none'.
INFO: [RTGEN 206-500] Setting interface mode on port 'fc_layer/num_outputs' to 'ap_none'.
INFO: [RTGEN 206-500] Setting interface mode on function 'fc_layer' to 'ap_ctrl_hs'.
INFO: [SYN 201-210] Renamed object name 'fc_layer_input_sub_array_0_V' to 'fc_layer_input_subkb' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_input_sub_array_1_V' to 'fc_layer_input_sucud' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_input_sub_array_2_V' to 'fc_layer_input_sudEe' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_input_sub_array_3_V' to 'fc_layer_input_sueOg' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_input_sub_array_4_V' to 'fc_layer_input_sufYi' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_input_sub_array_5_V' to 'fc_layer_input_sug8j' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_input_sub_array_6_V' to 'fc_layer_input_suhbi' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_input_sub_array_7_V' to 'fc_layer_input_suibs' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_input_sub_array_8_V' to 'fc_layer_input_sujbC' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_input_sub_array_9_V' to 'fc_layer_input_sukbM' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_input_sub_array_10_s' to 'fc_layer_input_sulbW' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_input_sub_array_11_s' to 'fc_layer_input_sumb6' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_input_sub_array_12_s' to 'fc_layer_input_suncg' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_input_sub_array_13_s' to 'fc_layer_input_suocq' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_input_sub_array_14_s' to 'fc_layer_input_supcA' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_input_sub_array_15_s' to 'fc_layer_input_suqcK' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_0_s' to 'fc_layer_output_srcU' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_1_s' to 'fc_layer_output_ssc4' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_2_s' to 'fc_layer_output_stde' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_3_s' to 'fc_layer_output_sudo' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_4_s' to 'fc_layer_output_svdy' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_5_s' to 'fc_layer_output_swdI' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_6_s' to 'fc_layer_output_sxdS' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_7_s' to 'fc_layer_output_syd2' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_8_s' to 'fc_layer_output_szec' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_9_s' to 'fc_layer_output_sAem' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_10' to 'fc_layer_output_sBew' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_11' to 'fc_layer_output_sCeG' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_12' to 'fc_layer_output_sDeQ' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_13' to 'fc_layer_output_sEe0' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_14' to 'fc_layer_output_sFfa' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_15' to 'fc_layer_output_sGfk' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_output_sub_array_stg' to 'fc_layer_output_sHfu' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_mux_164_16_1' to 'fc_layer_mux_164_IfE' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'fc_layer_mul_mul_16s_16s_32_1' to 'fc_layer_mul_mul_JfO' due to the length limit 20
INFO: [RTGEN 206-100] Generating core module 'fc_layer_mul_mul_JfO': 16 instance(s).
INFO: [RTGEN 206-100] Generating core module 'fc_layer_mux_164_IfE': 16 instance(s).
INFO: [RTGEN 206-100] Finished creating RTL model for 'fc_layer'.
INFO: [HLS 200-111]  Elapsed time: 4.51 seconds; current allocated memory: 117.723 MB.
INFO: [RTMG 210-278] Implementing memory 'fc_layer_input_subkb_ram' using block RAMs.
INFO: [HLS 200-111] Finished generating all RTL models Time (s): cpu = 00:02:21 ; elapsed = 00:01:36 . Memory (MB): peak = 537.262 ; gain = 204.582 ; free physical = 3021 ; free virtual = 4826
INFO: [SYSC 207-301] Generating SystemC RTL for fc_layer.
INFO: [VHDL 208-304] Generating VHDL RTL for fc_layer.
INFO: [VLOG 209-307] Generating Verilog RTL for fc_layer.
INFO: [HLS 200-112] Total elapsed time: 96.25 seconds; peak allocated memory: 117.723 MB.
Finished C synthesis.

谁知道我在这里失踪了什么?

vivado
1个回答
0
投票

我担心迭代次数必须在编译时知道。并且数组可能需要显式数组分区编译指示。

© www.soinside.com 2019 - 2024. All rights reserved.