Python NLTK 文本分散图的 y 纵轴是向后/相反的顺序

Question

自上个月以来，NLTK离散图似乎在我的机器上有相反的顺序的y（垂直）轴。这可能与我的软件版本有关（我使用的是学校虚拟机）。

版本： NLTK 3.8.1 matplotlib 3.7.2 Python 3.9.13

代码：

from nltk.draw.dispersion import dispersion_plot
words=['aa','aa','aa','bbb','cccc','aa','bbb','aa','aa','aa','cccc','cccc','cccc','cccc']
targets=['aa','bbb', 'f', 'cccc']
dispersion_plot(words, targets)

预期：aaa 出现在开头，cccc 出现在结尾。实际：这是倒退的！另请注意 f 应该完全不存在 - 相反 bbb 不存在。

结论： Y 轴向后。

Answer 1

我找到了nltk.draw.分散的源代码，似乎有错误。

def dispersion_plot(text, words, ignore_case=False, title="Lexical Dispersion Plot"):
    """
    Generate a lexical dispersion plot.

    :param text: The source text
    :type text: list(str) or iter(str)
    :param words: The target words
    :type words: list of str
    :param ignore_case: flag to set if case should be ignored when searching text
    :type ignore_case: bool
    :return: a matplotlib Axes object that may still be modified before plotting
    :rtype: Axes
    """

    try:
        import matplotlib.pyplot as plt
    except ImportError as e:
        raise ImportError(
            "The plot function requires matplotlib to be installed. "
            "See https://matplotlib.org/"
        ) from e

    word2y = {
        word.casefold() if ignore_case else word: y
        for y, word in enumerate(reversed(words))  # <--- HERE
    }
    xs, ys = [], []
    for x, token in enumerate(text):
        token = token.casefold() if ignore_case else token
        y = word2y.get(token)
        if y is not None:
            xs.append(x)
            ys.append(y)

    _, ax = plt.subplots()
    ax.plot(xs, ys, "|")
    ax.set_yticks(list(range(len(words))), words, color="C0")  # <--- HERE
    ax.set_ylim(-1, len(words))
    ax.set_title(title)
    ax.set_xlabel("Word Offset")
    return ax



if __name__ == "__main__":
    import matplotlib.pyplot as plt

    from nltk.corpus import gutenberg

    words = ["Elinor", "Marianne", "Edward", "Willoughby"]
    dispersion_plot(gutenberg.words("austen-sense.txt"), words)
    plt.show()

它使用

word2y

计算

reversed(words)

（参见：

for y, word in enumerate(reversed(words))

），但后来它使用

ax.set_yticks()

来计算

words

。它应该在

reversed(words)

中使用

ax.set_yticks()

，或者应该在不使用

word2y

的情况下计算

reversed()

。我在上面的代码中添加了

# <--- HERE

来显示它。

需要将其报告为问题。

此时您可以得到

ax

并使用

set_yticks

和

reversed

进行修正。
在您的代码中，它将是

targets

而不是

words

ax = dispersion_plot(words, targets)

ax.set_yticks(list(range(len(targets))), reversed(targets), color="C0")

完整的工作代码

import matplotlib.pyplot as plt
from nltk.draw.dispersion import dispersion_plot

words = ['aa','aa','aa','bbb','cccc','aa','bbb','aa','aa','aa','cccc','cccc','cccc','cccc']
targets = ['aa','bbb', 'f', 'cccc']

ax = dispersion_plot(words, targets)
ax.set_yticks(list(range(len(targets))), reversed(targets), color="C0")

plt.show()

Python NLTK 文本分散图的 y 纵轴是向后/相反的顺序

问题描述投票：0回答：1

1个回答

最新问题

Python NLTK 文本分散图的 y 纵轴是向后/相反的顺序

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1