为什么嵌入层中使用 V+1（`Embedding(V+1,D)(i)`），其中 V 是词汇量？

Question

假设

from tensorflow.keras.preprocessing.text import Tokenizer
tokenizer = Tokenizer()
...
V = len(tokenizer.word_index)

其中

是词汇量。

有人告诉我嵌入层

x = Embedding(V+1,D)(i)

其中

输出向量的维度。但我不确定为什么嵌入层的大小必须是

(V+1,D)

而不是

(V,D)

，特别是因为

tokenizer.word_index

的索引从

而不是

开始，即

tokenizer.word_index
{'UNK': 1,
 'the': 2,
 ',': 3,
 '.': 4,
 'of': 5,
 'and': 6, 
...}

所以

tokenizer.word_index

（字典）的最大索引（如果转换为列表）实际上是

V-1

。

为什么

V+1

在嵌入层（

Embedding(V+1,D)(i)

）中，

是词汇量？

Answer 1

主要原因是词汇表大小必须比 len(word_index) 高一个单位才能索引到最大的 token ID。请阅读以下链接的更多详细信息（至少还有一个原因）： https://datascience.stackexchange.com/questions/93651/reason-for-adding-1-to-word-index-for-sequence-modeling