引用的主题建模

问题描述 投票:0回答:1

基于以下链接:引用

在以下代码的帮助下(该网站基于javascript,所以首先我已禁用它)

import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import pandas as pd
from selenium.webdriver.common.keys import Keys
browser =webdriver.Chrome()
browser.get("https://quotes.toscrape.com/")
elem = browser.find_elements(By.CLASS_NAME, 'author')  # Find the search box
quot_choosing =browser.find_elements(By.CLASS_NAME,'text')
autors=[]
quotes =[]
for  author in elem:
    autors.append(author.text)
for quote in quot_choosing:
    quotes.append(quote.text)
print(autors)
print(quotes)

autor_saying =pd.DataFrame({"Author":autors,"Quotes":quotes})
autor_saying.to_csv("quotes.csv",index=False)
print(autor_saying.head())
browser.quit()

我在 csv 文件中有作者和引用的信息,然后按照下面给出的方式阅读它:

import pandas as pd
from bertopic import BERTopic
model =BERTopic()

summarization =[]
data =pd.read_csv("quotes.csv")
print(data.head())
for  index, row in data.iterrows():
    topics, probs =model.fit_transform([row['Quotes']])
    print(topics)

这是结果:

   Author                                             Quotes
0  Albert Einstein  “The world as we have created it is a process ...
1     J.K. Rowling  “It is our choices, Harry, that show what we t...
2  Albert Einstein  “There are only two ways to live your life. On...
3      Jane Austen  “The person, be it gentleman or lady, who has ...
4   Marilyn Monroe  “Imperfection is beauty, madness is genius and...

另外我想使用 bertopic 模型来检测给定站点的主题: 主题建模

但是我的代码给了我以下错误:

ValueError: Transform unavailable when model was fit with only a single data sample.

你能帮我解决这个问题吗?如何检测句子中出现的主题?

python bert-language-model topic-modeling
1个回答
0
投票

您应该一次使用所有报价进行训练,而不是逐一进行。所以而不是

for  index, row in data.iterrows():
    topics, probs =model.fit_transform([row['Quotes']])
    print(topics)

尝试

topics, probs = model.fit_transform(data['Quotes'].tolist())
data['Topic'] = topics
data['Probability'] = probs
print(data.head())
© www.soinside.com 2019 - 2024. All rights reserved.