我如何在MonitoredTrainingSession中获得global_step?

问题描述 投票:2回答:1

我正在分布式TensorFlow中运行分布式mnist模型。我想“手动”监视global_step的演变以进行调试。在分布式TensorFlow设置中迈出全局一步的最佳方法是什么?

下面的我的代码

 ...

with tf.device(device):
  images = tf.placeholder(tf.float32, [None, 784], name='image_input')
  labels = tf.placeholder(tf.float32, [None], name='label_input')
  data = read_data_sets(FLAGS.data_dir,
          one_hot=False,
          fake_data=False)
  logits = mnist.inference(images, FLAGS.hidden1, FLAGS.hidden2)
  loss = mnist.loss(logits, labels)
  loss = tf.Print(loss, [loss], message="Loss = ")
  train_op = mnist.training(loss, FLAGS.learning_rate)

hooks=[tf.train.StopAtStepHook(last_step=FLAGS.nb_steps)]

with tf.train.MonitoredTrainingSession(
    master=target,
    is_chief=(FLAGS.task_index == 0),
    checkpoint_dir=FLAGS.log_dir,
    hooks = hooks) as sess:


  while not sess.should_stop():
    xs, ys = data.train.next_batch(FLAGS.batch_size, fake_data=False)
    sess.run([train_op], feed_dict={images:xs, labels:ys})

      global_step_value = # ... what is the clean way to get this variable
tensorflow distributed-computing
1个回答
0
投票

通常,良好的做法是在图形定义过程中初始化全局step变量,例如global_step = tf.Variable(0, trainable=False, name='global_step')。然后,您可以使用graph.get_tensor_by_name("global_step:0")轻松获取全局步骤。

© www.soinside.com 2019 - 2024. All rights reserved.