我被要求实现简单前缀树的插入方法,其定义如下:
def insert(self, value: Any, weight: float, prefix: list) -> None:
"""Insert the given value into this autocompleter.
The value is inserted with the given weight, and is associated with
the prefix sequence prefix.
If the value has already been inserted into this autocompleter,
then the given weight should be added to the existing weight of this value.
如所描述的,插入方法应该以这种方式运行:如果我们要将单词“pig”插入树中,并且没有存储“pig”的叶子,则tree.insert('pig', 4.0, [ 'p', 'i', 'g']) 应该给我们一个新的前缀树,它从根开始 [] -> ['p'] -> ['p', 'i'] -> [' p'、'i'、'g'] -> '猪'。 SimplePrefixTree 的实例属性有 self.root(返回存储的值)、self.weight(返回浮点型权重)、self.subtrees(以 SimplePrefixTree 列表的形式返回节点的子树),还有其他方法称为is_empty()(检查这棵树是否为空,即 self.root = [], self.weight = 0.0, self.subtrees = []),is_leaf()(检查这棵树是否为叶子,即 self.subtrees = [] 且自重 > 0).
这个方法据说可以通过递归实现,但是我不知道怎么做。有什么帮助吗? 我想查看详细的解释和可以用来模拟的示例代码。
嗯,在实现这个之前,你首先需要一棵树。 使用 littletree (我是作者),我们可以从创建一个树节点开始。
每个树节点需要存储:
import littletree
class AutoCompleter(littletree.BaseNode):
def __init__(self, letter=""):
super().__init__(identifier=letter)
self.weight = 0 # weight of leave nodes
self.total_weight = None # sum of weight for all leaves
需要插入或重建单词。我们将添加一个
$
来标记单词结束的位置。
def insert_word(self, word, weight=1):
"""Adds a word to tree."""
prefix = list(word)
prefix.append('$') # Add a terminal to prefix
# Retrieve or create recursively
word_node = self.path.create(prefix)
word_node.weight += weight
return word_node
def retrieve_word(self):
"""Get full word."""
return "".join(segment.identifier for segment in self.path)
现在我们可以存储和检索单词,我们可以进行预测:
def predict(self, partial=""):
"""Now we can make predictions"""
prefix = list(partial)
prefix_node = self.path.get(prefix)
if prefix_node is not None:
for leaf_node in prefix_node.iter_leaves():
yield leaf_node.retrieve_word()
如果我们希望我们的预测变得更好,我们应该按total_weight进行计算和排序。
def optimize(self):
"""Sort nodes by weights."""
# Calculate total_weights
for node in self.iter_tree(order="post"):
if node.is_leaf:
node.total_weight = node.weight
else:
node.total_weight = sum(c.total_weight for c in node.children)
# Sort from high total_weight to low
self.sort_children(key=operator.attrgetter("total_weight"), recursive=True, reverse=True)
最后我们可以这样使用:
autocompleter = AutoCompleter()
for word in ["pig", "pork", "pigeon", "pig", "pheasant", "pinfish", "pink salmon"]:
autocompleter.insert_word(word)
autocompleter.optimize()
for prediction in autocompleter.predict("pi"):
print(prediction)
我们对“pi”的最高预测是:
pig$
pigeon$
pinfish$
pink salmon$
可以根据您想要实现的目标来调整其中的各个部分。
在内部,构建的树看起来像这样:
└─ p (7)
├─ i (5)
│ ├─ g (3)
│ │ ├─ $ (2)
│ │ └─ e (1)
│ │ └─ o (1)
│ │ └─ n (1)
│ │ └─ $ (1)
...