在TensorFlow Functional API中嵌入具有200.000个不同单词的字典

Question

我已经在Stakoverflow和有关Keras和TensorFlow嵌入的教程中检查了几个问题，但没有找到适合我的答案。我解释。

我有200.000个单词的字典。具有10376个独特的“单词”。它们代表蜂窝设备ID。 IMEI。在此特定实例中，我想使用Keras Functional API对其进行处理，然后在解决此问题时最终与数值数据合并。

但是我可以通过第一层的嵌入部分。

这里是代码

#lenght of the device 
device_len = len(device)
device_len
200000
#uniques device inside the 200000
top_words = len(np.unique(device))
top_words
10376

#keras encoded
encoded_docs = [one_hot(d, top_words) for d in device]

#max length of the vector for each word
max_length = 2
padded_docs = pad_sequences(encoded_docs, maxlen=max_length, padding='post')

#converted to tensors
padded_docs = tf.convert_to_tensor(padded_docs)
sess = tf.InteractiveSession()  
print(padded_docs.eval())
sess.close()

#here is the networks
top_words = 10376
embedding_vector_length = 2
x = Embedding(top_words, embedding_vector_length)(padded_docs)
x = Dense(2, activation='sigmoid')(x)
modelx = Model(inputs=padded_docs, outputs = x)

ValueError: Input tensors to a Model must come from `keras.layers.Input`. Received: Tensor("Const:0", shape=(200000, 2), dtype=int32) (missing previous layer metadata).

我检查了类似的问题和答案，但找不到适合我的东西。

如果有人可以帮助我，将不胜感激

的确非常感谢。

Answer 1

 from keras.layers import Input

 inputs = Input((doc_length,))
 x = Embedding(top_words, embedding_vector_length)(inputs)
 x = Dense(2, activation='sigmoid')(x)

 modelx = Model(inputs=inputs, outputs = x)

[此外，您还需要padded_docs由“整数”组成，而不是由一键编码组成。 Embedding层需要整数。

重要的是要注意，您不会将其作为张量传递，而是作为常规的numpy数组传递，以使用model.fit进行训练。

因此您需要删除one_hot和convert_to_tensor部分。

然后您将执行model.fit(padded_docs, whatever_outputs, .....etc....)

Answer 2

input = keras.layers.Input((max_length,))
x = Embedding(top_words, embedding_vector_length)(input)
x = Dense(2, activation='sigmoid')(x)
modelx = Model(inputs=input, outputs=x)

在TensorFlow Functional API中嵌入具有200.000个不同单词的字典

问题描述投票：0回答：2

2个回答

最新问题

在TensorFlow Functional API中嵌入具有200.000个不同单词的字典

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2