理解设备分配，并行性（tf.while_loop）和张量流中的tf.function

Question

我试图在tensorflow中理解GPU上的并行性，因为我需要在uglier图上应用它。

import tensorflow as tf
from datetime import datetime

with tf.device('/device:GPU:0'):
    var = tf.Variable(tf.ones([100000], dtype=tf.dtypes.float32), dtype=tf.dtypes.float32)

@tf.function
def foo():
    return tf.while_loop(c, b, [i], parallel_iterations=1000)      #tweak

@tf.function
def b(i):
    var.assign(tf.tensor_scatter_nd_update(var, tf.reshape(i, [-1,1]), tf.constant([0], dtype=tf.dtypes.float32)))
    return tf.add(i,1)

with tf.device('/device:GPU:0'):
    i = tf.constant(0)
    c = lambda i: tf.less(i,100000)

start = datetime.today()
with tf.device('/device:GPU:0'):
    foo()
print(datetime.today()-start)

在上面的代码中，var是一个长度为100000的张量，其元素如上所示进行更新。当我将parallel_iterations值从10,100,1000,10000更改时。即使明确提到parallel_iterations变量，几乎没有任何时间差异（均为9.8s）。

我希望这些在GPU上并行发生。我该如何实现它？

Answer 1

一种技术是使用分配策略和范围：

strategy = tf.distribute.MirroredStrategy()

with strategy.scope():
  inputs = tf.keras.layers.Input(shape=(1,))
  predictions = tf.keras.layers.Dense(1)(inputs)
  model = tf.keras.models.Model(inputs=inputs, outputs=predictions)
  model.compile(loss='mse',
                optimizer=tf.train.GradientDescentOptimizer(learning_rate=0.2))

另一种选择是复制每个设备上的操作：

# Replicate your computation on multiple GPUs
c = []
for d in ['/device:GPU:2', '/device:GPU:3']:
  with tf.device(d):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
    c.append(tf.matmul(a, b))
with tf.device('/cpu:0'):
  sum = tf.add_n(c)

有关详细信息，请参阅this guide

理解设备分配，并行性（tf.while_loop）和张量流中的tf.function

问题描述投票：1回答：1

1个回答

最新问题

理解设备分配，并行性（tf.while_loop）和张量流中的tf.function

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1