我正在分析以确定给定文本中的“功能需求”。为了实现这一目标,我从文本中提取了名词和动词,并应用KMeans聚类将相似的语义单词分组在一起。随后,我通过关注包含大量动词和专有名词的集群来确定功能需求,因为这些通常分别表示可操作的任务和特定实体。但每次运行代码时,集群总是“出现不同的情况”,所以我想问是否有任何方法可以“修复每次运行的输出”。这样我就可以对其应用进一步的逻辑。
另外,如果您对这种寻找 SRS 生成功能需求的方法有任何反馈、评论或改进,我将不胜感激。
以下是代码:
import spacy
import numpy as np
from sklearn.cluster import KMeans
from spacy.lang.en.stop_words import STOP_WORDS
import re
import nltk
# nltk.download('wordnet')
from nltk.stem import WordNetLemmatizer
wordnet_lemmatizer = WordNetLemmatizer()
from textblob import TextBlob
nlp = spacy.load("en_core_web_sm")
nouns = set()
verbs = set()
sentence = "A hotel has a certain number of rooms. Each room can be either single bed or double bed type and may be AC or Non-AC type. Guests can reserve rooms in advance or can reserve rooms on the spot depending upon availability of rooms. The receptionist would enter data pertaining to guests such as their arrival time, advance paid, approximate duration of stay, and the type of the room required. Depending on this data and subject to the availability of a suitable room, the computer would allot a room number to the guest and assign a unique token number to each guest. If the guest cannot be accommodated, the computer generates an apology message. The hotel catering services manager would input the quantity and type of food items as and when consumed by the guest, the token number of the guest, and the corresponding date and time. When a customer prepares to check-out, the hotel automation software should generate the entire bill for the customer and also print the balance amount payable by him. During check-out, guests can opt to register themselves for a frequent guests program."
clean_text = re.sub(r'[^A-Za-z\s]', '', sentence)
# Tokenize the sentence and remove stopwords
tokens = nlp(clean_text)
filtered_words = [token.text for token in tokens if token.text.lower() not in STOP_WORDS]
filtered_sentence = ' '.join(filtered_words)
blob = TextBlob(filtered_sentence)
tags = blob.tags
# print(tags)
for word, tag in tags:
if tag.startswith('NN'): # Check if the tag indicates a noun
nouns.add(word.lower())
elif tag.startswith('VB'): # Check if the tag indicates a verb
verbs.add(word.lower())
# print("Nouns => ", nouns)
# print("Verbs => ", verbs)
combined_sentence = list(nouns) + list(verbs)
newstr=' '.join(combined_sentence)
filtered_tokens = nlp(newstr)
word_vectors = np.array([token.vector for token in filtered_tokens])
# Perform K-means clustering on the word vectors
num_clusters = 15
kmeans = KMeans(n_clusters=num_clusters)
kmeans.fit(word_vectors)
# Retrieve the cluster labels for each filtered token
cluster_labels = kmeans.labels_
word_clusters = {}
for i, token in enumerate(filtered_tokens):
if token.text==".":
continue
if cluster_labels[i] in word_clusters:
word_clusters[cluster_labels[i]].append(token.text)
else:
word_clusters[cluster_labels[i]] = [token.text]
for cluster_id, words in word_clusters.items():
print(f"Cluster {cluster_id + 1}: {', '.join(words)}")
# Calculate verb density threshold based on total words in clusters
threshold = len(filtered_tokens) // num_clusters
print("\nFunctional Requirements:")
for cluster_id, words in word_clusters.items():
verb_count = sum(1 for word in words if nlp(word)[0].pos_ == 'VERB' or nlp(word)[0].pos_ == 'PROPN')
if verb_count >= 1:
print(f"Cluster {cluster_id + 1}: {', '.join(words)}")
您需要的是来自 kmeans 构造函数
kmeans_docs的random_state
在一个更简单的示例中,在 Jupyter NB 的单元上运行以下代码:
import numpy as np
np.random.seed(seed=20)
test = np.random.random(size=(100,6))
num_clusters = 15
kmeans = KMeans(n_clusters=4, random_state=159)
kmeans.fit(test)
kmeans.predict(test)
然后请注意,在不同的单元格上运行以下代码总是会得到相同的结果。
num_clusters = 15
new_kmeans = KMeans(n_clusters=4, random_state=159)
new_kmeans.fit(test)
new_kmeans.predict(test)
(即使重新启动内核,仍然会得到相同的结果)。