我如何允许将文本输入到TensorFlow模型？

Question

我正在使用TensorFlow中的自定义文本分类模型，现在想使用TensorFlow进行设置以用于生产部署。该模型基于通过单独模型计算出的文本嵌入进行预测，并且该模型要求将原始文本编码为矢量。

我现在以某种不连贯的方式进行这项工作，其中一项服务执行所有文本预处理，然后计算嵌入，然后将其作为嵌入的文本向量发送到文本分类器。如果我们可以将所有这些捆绑到一个TensorFlow服务模型中，尤其是初始文本预处理步骤，那就太好了。

这就是我被困住的地方。您如何构造作为原始文本输入的Tensor（或其他TensorFlow原语）？您是否需要做一些特殊的事情来为令牌向量组件映射指定查找表，以便将其保存为模型包的一部分？

作为参考，这是我现在所拥有的大致近似：

input = tf.placeholder(tf.float32, [None, 510], name='input')

# lots of steps omitted for brevity/clarity

outputs = tf.linalg.matmul(outputs, terminal_layer, transpose_b=True, name='output')

sess = tf.Session()
tf.saved_model.simple_save(sess,
                           'model.pb',
                           inputs={'input': input}, outputs={'output': outputs})

Answer 1

我的模型使用的是一袋字的方法，而不是保留顺序，尽管那是对代码的非常简单的更改。

假设您有一个列表对象，该列表对象对您的词汇进行编码（我将其称为vocab）和要使用的相应术语/令牌嵌入矩阵（我将其称为raw_term_embeddings，因为我正在强迫放入张量），代码将如下所示：

initalizer = tf.lookup.KeyValueTensorInitializer(vocab, np.arange(len(vocab))) lut = tf.lookup.StaticVocabularyTable(initalizer, 1) # the one here is the out of vocab size lut.initializer.run(session=sess) # pushes the LUT onto the session input = tf.placeholder(tf.string, [None, None], name='input') ones_at = lut.lookup(input) encoded_text = tf.math.reduce_sum(tf.one_hot(ones_at, tf.dtypes.cast(lut.size(), np.int32)), axis=0, keepdims=True) # I didn't build an embedding for the out of vocabulary token term_embeddings = tf.convert_to_tensor(np.vstack([raw_term_embeddings]), dtype=tf.float32) embedded_text = tf.linalg.matmul(encoded_text, term_embeddings) # then use embedded_text for the remainder of the model

一个小技巧还确保将legacy_init_op=tf.tables_initializer()传递给save函数，以提示TensorFlow Serving在加载模型时初始化用于文本编码的查找表。

我如何允许将文本输入到TensorFlow模型？

问题描述投票：1回答：1

1个回答

最新问题

我如何允许将文本输入到TensorFlow模型？

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1