我的文件中根本不存在一行代码,但出现错误

问题描述 投票:0回答:0

我制作了一个 Flask 应用程序,它通过表单从用户那里获取两个字符串输入,并使用 NLTK porterstemmer 和停用词处理这些字符串。 问题是 pythonanywhere 告诉我它无法从 nltk_data 找到停用词文件。 我尝试修复了好几次,但还是不行。

我尝试重新加载几次,清除缓存文件,删除整个应用程序,然后再次上传文件,并在我的本地计算机上成功运行下面提到的代码的所有迭代

代码如下:

from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer

nltk.download('stopwords')

port_stem = PorterStemmer()

def stemming(content):

    stemmed_content = re.sub('[^0-9a-zA-Z]', ' ', content)
    stemmed_content = stemmed_content.lower()
    stemmed_content = stemmed_content.split()
    stemmed_content = [
        port_stem.stem(word)
        for word in stemmed_content
        if word not in stopwords.words('english')
    ]
    stemmed_content = ' '.join(stemmed_content)
    return stemmed_content

这段代码在我的机器上工作(显然),但是当我将其部署在 pythonanywhere 上时,HTML 文件加载,但这段代码给出了错误,即

我很困惑,所以我查找了 stackoverflow,并提到了一个答案,将路径设置为

nltk.path.append("home/NewsValidator/nltk_data/) 
我这样做了,但仍然弹出相同的错误消息。

所以我做了下一个合乎逻辑的事情,那就是

from nltk.stem.porter import PorterStemmer

stopwords = ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', "don't", 'should', "should've", 'now', 'd', 'll', 'm', 'o', 're', 've', 'y', 'ain', 'aren', "aren't", 'couldn', "couldn't", 'didn', "didn't", 'doesn', "doesn't", 'hadn', "hadn't", 'hasn', "hasn't", 'haven', "haven't", 'isn', "isn't", 'ma', 'mightn', "mightn't", 'mustn', "mustn't", 'needn', "needn't", 'shan', "shan't", 'shouldn', "shouldn't", 'wasn', "wasn't", 'weren', "weren't", 'won', "won't", 'wouldn', "wouldn't"]

port_stem = PorterStemmer()

def stemming(content):

    stemmed_content = re.sub('[^0-9a-zA-Z]', ' ', content)
    stemmed_content = stemmed_content.lower()
    stemmed_content = stemmed_content.split()
    stemmed_content = [
        port_stem.stem(word)
        for word in stemmed_content
        if word not in stopwords.words('english')
    ]
    stemmed_content = ' '.join(stemmed_content)
    return stemmed_content

它仍然显示相同的错误日志。我什至没有从 nltk 中调用停用词。

最终,我做了这样的事情:

from nltk.stem.porter import PorterStemmer

port_stem = PorterStemmer()

def stemming(content):

    stemmed_content = re.sub('[^0-9a-zA-Z]', ' ', content)
    stemmed_content = stemmed_content.lower()
    stemmed_content = stemmed_content.split()
    stemmed_content = [
        port_stem.stem(word)
        for word in stemmed_content]
    stemmed_content = ' '.join(stemmed_content)
    return stemmed_content

我在本地机器上测试过,我的模型在这之后仍然足够准确,并且只降低了 1% 的准确度(此时我可以接受)。

你猜怎么着,它仍然显示相同的错误!!!! 我现在甚至不再使用术语“停用词”。

python machine-learning nltk pythonanywhere
© www.soinside.com 2019 - 2024. All rights reserved.