按数据类别的主题建模

问题描述 投票:0回答:1

我想看看是否有一种方法可以通过使用LDA按类别而不是整个数据集进行主题建模来获取主题。

我的数据看起来像这样。

Comment                                                                                  Division
Smooth execution of Regional Administration in my absence. Well done.                    Finance
Job well done in completing CPs and making the facility available well in time.          Finance
Good Job performed on the successful implementation of Cash IVR.                         Commercial

我想通过部门来获得话题]

我的当前代码为我提供了总体主题。

library(udpipe)
library(topicmodels)
x <- udpipe(x= pnotips$Feedback.Comments, object= ud_model)
x$topic_level_id <- unique_identifier(x, fields = c("doc_id", "paragraph_id", "sentence_id"))

dtf <- subset(x, upos %in% c("NOUN", "ADJ"))


dtf <- document_term_frequencies(dtf, document = "topic_level_id", term = "lemma")
dtm <- document_term_matrix(x = dtf)
dtm_clean <- dtm_remove_lowfreq(dtm, minfreq = 3)
## Build topic model + get topic terminology
m <- LDA(dtm_clean, k = 4, method = "Gibbs", 
         control = list(nstart = 5, burnin = 2000, best = TRUE, seed = 1:5))
topicterminology <- predict(m, type = "terms", min_posterior = 0.025, min_terms = 5)

我想按每个Division_Name

]获取主题

样本数据

  structure(list(Feedback.Comments = c("Excellent kick start of p", 
"Nauman is very collaborative when it comes to team deliverable. He takes ownership and ensure to support whenever needed or asked for. ", 
"Thank you for being very collaborative and designing and planning the whole workshop that deemed success today for R", 
"Amazing knowledge sharing session conducted by you. Truly innovative.", 
"Thanks a lot for your collaboration during my training dates", 
"During Prepaid Consolidation Step 1, you have done excellent job in handling the Mediation stream resulting in a smooth delivery.  The highlights of this delivery was the collaboration which was executed excellently.", 
"He handles all the organization customers in a very collaborative manner.", 
"Noor ul Amin is very supportive and initiative hungry person, always take very quick/bold step when ever any issue happened. ", 
"Keeping check on timely rectification of observations by HSSE with good speed.", 
"Smooth execution of Regional Administration in my absence. Well done.", 
"Good Job performed on the successful implementation of Jazz Cash IVR. the 1st selfservice IVR for financial transactions in industry.", 
"Despite challenges on the resource side you have done exceptionally well in managing the UATs, Prepaid Consolidation and assigned tasks.\n\nWe need focus more on FCR & NPS related areas so we are able to meet our KPIs, looking forward for stats and feedback on time. It would be better if we dedicate one resources on this side and not deploy all resources on prepaid consolidation (it will not give us any benefit)", 
"Job well done in reorganizing all the investments to fixed portfolios.\n Keep it up.", 
"Well done in reorganizing PF process and resolving legacy issues.", 
"Job well done in completing CPs and making the facility available well in time.", 
"You always seems supportive on these requests.sometimes you also submitted input in late hours of the day. Keep ot up.", 
"Well done on completing Hiperos screening for almost 30 profiles. Please pass the feedback to Khurram and Babar.", 
"You always make your concerns clear at a judgment. It always good to have a critical view on things, helps avoiding mistakes. Keep it up.", 
"Both FLT in Lahore and Karachi were planned, managed and executed to the perfection under your lead. Wonderful collaboration with P&O and cross functional teams. Good job and good management. ", 
"Very good resource. Always up to the expectations.\nDid good job in back office evaluations"
), division_name = c("People & Organization", "People & Organization", 
"People & Organization", "People & Organization", "People & Organization", 
"Technology", "Finance", "Finance", "People & Organization", 
"People & Organization", "Commercial", "Commercial", "Finance", 
"Finance", "Finance", "Finance", "Finance", "Finance", "People & Organization", 
"Commercial")), row.names = c(1L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 
10L, 11L, 17L, 21L, 23L, 24L, 25L, 29L, 31L, 32L, 35L, 37L), class = "data.frame")

我想看看是否有一种方法可以通过使用LDA按类别而不是整个数据集进行主题建模来获取主题。我的数据看起来像这样。评论...

r lda topic-modeling
1个回答
0
投票

您可以为每个部门运行一个单独的主题模型-通过按部门对数据进行子集设置(使用department1data <- subset(mydata, division == "Finance")之类的方法)然后运行主题模型很容易做到。这为您提供了每个部门讨论的主题,并且应该工作得很好。

或者,您可以使用“ postterior`”方法或类似方法尝试查看来自每个部门的每个文档的主题负载,例如对它们进行平均。我发现tidy包在R中使用LDA时很有用(如果您要这样做,请参见链接教程中的6.2.2节)。

© www.soinside.com 2019 - 2024. All rights reserved.