我正在使用lstm方法实施情绪分析,在此我已经完成了训练模型以及预测部分。但是我的预测出现在一列中。下面我将向您展示。
这是我的代码:
with open('output1.json', 'w') as f:
json.dump(new_data, f)
selection1 = new_data['selection1']
#creating empty list to be able to create a dataframe
names = []
dates = []
commentss = []
labels = []
hotelname = []
for item in selection1:
name = item['name']
hotelname.append(name)
#print ('>>>>>>>>>>>>>>>>>> ', name)
Date = item['reviews']
for d in Date:
names.append(name)
#convert date from 'january 12, 2020' to 2020-01-02
date = pd.to_datetime(d['date']).strftime("%Y-%m-%d")
#adding date to the empty list dates[]
dates.append(date)
#print('>>>>>>>>>>>>>>>>>> ', date)
CommentID = item['reviews']
for com in CommentID:
comment = com['review']
lcomment = comment.lower() # converting all to lowercase
result = re.sub(r'\d+', '', lcomment) # remove numbers
results = (result.translate(
str.maketrans('', '', string.punctuation))).strip() # remove punctuations and white spaces
comments = remove_stopwords(results)
commentss.append(comment)
# print('>>>>>>',comments)
#add the words in comments that are already present in the keys of dictionary
encoded_samples = [[word2id[word] for word in comments if word in word2id.keys()]]
# Padding
encoded_samples = keras.preprocessing.sequence.pad_sequences(encoded_samples, maxlen=max_words)
# Make predictions
label_probs, attentions = model_with_attentions.predict(encoded_samples)
label_probs = {id2label[_id]: prob for (label, _id), prob in zip(label2id.items(), label_probs[0])}
labels.append(label_probs)
#creating dataframe
dataframe={'name': names,'date': dates, 'comment': commentss, 'classification': labels}
table = pd.DataFrame(dataframe, columns=['name', 'date', 'comment', 'classification'])
json = table.to_json('hotel.json', orient='records')
这是我获得的结果:
[
{
"name": "Radisson Blu Azuri Resort & Spa",
"date": "February 02, 2020",
"comment": [
"enjoy",
"daily",
"package",
"start",
"welcoming",
"end",
"recommend",
"hotel"
],
"label": {
"joy": 0.0791392997,
"surprise": 0.0002606699,
"love": 0.4324670732,
"sadness": 0.2866959572,
"fear": 0.0002588668,
"anger": 0.2011781186
}
},
您可以在此链接上找到完整的输出:https://jsonblob.com/a9b4035c-5576-11ea-afe8-1d95b3a2e3fd
是否可以将标签字段分成如下所示的单独字段?
[
{
"name": "Radisson Blu Azuri Resort & Spa",
"date": "February 02, 2020",
"comment": [
"enjoy",
"daily",
"package",
"start",
"welcoming",
"end",
"recommend",
"hotel"
],
"joy": 0.0791392997,
"surprise": 0.0002606699,
"love": 0.4324670732,
"sadness": 0.2866959572,
"fear": 0.0002588668,
"anger": 0.2011781186
},
有人可以帮我,我该如何修改我的代码并使这成为可能,请大家向我解释。。
如果您在生成结果之前无法执行此操作,则可以像这样轻松地操作该词典:
def move_labels_to_dict_root(result):
labels = result["labels"]
meta_data = result
del meta_data["labels"]
result = {**meta_data, **labels}
return result
然后在列表理解中,如move_labels_to_dict_root
调用[move_labels_to_dict_root(result) for result in results]
。
但是,我想问你为什么要这样做?