自上个月以来,NLTK离散图似乎在我的机器上有相反的顺序的y(垂直)轴。这可能与我的软件版本有关(我使用的是学校虚拟机)。
版本: NLTK 3.8.1 matplotlib 3.7.2 Python 3.9.13
代码:
from nltk.draw.dispersion import dispersion_plot
words=['aa','aa','aa','bbb','cccc','aa','bbb','aa','aa','aa','cccc','cccc','cccc','cccc']
targets=['aa','bbb', 'f', 'cccc']
dispersion_plot(words, targets)
预期:aaa 出现在开头,cccc 出现在结尾。 实际:这是倒退的!另请注意 f 应该完全不存在 - 相反 bbb 不存在。
结论: Y 轴向后。
我找到了nltk.draw.分散的源代码,似乎有错误。
def dispersion_plot(text, words, ignore_case=False, title="Lexical Dispersion Plot"):
"""
Generate a lexical dispersion plot.
:param text: The source text
:type text: list(str) or iter(str)
:param words: The target words
:type words: list of str
:param ignore_case: flag to set if case should be ignored when searching text
:type ignore_case: bool
:return: a matplotlib Axes object that may still be modified before plotting
:rtype: Axes
"""
try:
import matplotlib.pyplot as plt
except ImportError as e:
raise ImportError(
"The plot function requires matplotlib to be installed. "
"See https://matplotlib.org/"
) from e
word2y = {
word.casefold() if ignore_case else word: y
for y, word in enumerate(reversed(words)) # <--- HERE
}
xs, ys = [], []
for x, token in enumerate(text):
token = token.casefold() if ignore_case else token
y = word2y.get(token)
if y is not None:
xs.append(x)
ys.append(y)
_, ax = plt.subplots()
ax.plot(xs, ys, "|")
ax.set_yticks(list(range(len(words))), words, color="C0") # <--- HERE
ax.set_ylim(-1, len(words))
ax.set_title(title)
ax.set_xlabel("Word Offset")
return ax
if __name__ == "__main__":
import matplotlib.pyplot as plt
from nltk.corpus import gutenberg
words = ["Elinor", "Marianne", "Edward", "Willoughby"]
dispersion_plot(gutenberg.words("austen-sense.txt"), words)
plt.show()
它使用
word2y
计算 reversed(words)
(参见:for y, word in enumerate(reversed(words))
),但后来它使用 ax.set_yticks()
来计算 words
。它应该在 reversed(words)
中使用 ax.set_yticks()
,或者应该在不使用 word2y
的情况下计算 reversed()
。我在上面的代码中添加了 # <--- HERE
来显示它。
需要将其报告为问题。
此时您可以得到
ax
并使用 set_yticks
和 reversed
进行修正。targets
而不是 words
ax = dispersion_plot(words, targets)
ax.set_yticks(list(range(len(targets))), reversed(targets), color="C0")
完整的工作代码
import matplotlib.pyplot as plt
from nltk.draw.dispersion import dispersion_plot
words = ['aa','aa','aa','bbb','cccc','aa','bbb','aa','aa','aa','cccc','cccc','cccc','cccc']
targets = ['aa','bbb', 'f', 'cccc']
ax = dispersion_plot(words, targets)
ax.set_yticks(list(range(len(targets))), reversed(targets), color="C0")
plt.show()