基于以下链接:引用
在以下代码的帮助下(该网站基于javascript,所以首先我已禁用它)
import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import pandas as pd
from selenium.webdriver.common.keys import Keys
browser =webdriver.Chrome()
browser.get("https://quotes.toscrape.com/")
elem = browser.find_elements(By.CLASS_NAME, 'author') # Find the search box
quot_choosing =browser.find_elements(By.CLASS_NAME,'text')
autors=[]
quotes =[]
for author in elem:
autors.append(author.text)
for quote in quot_choosing:
quotes.append(quote.text)
print(autors)
print(quotes)
autor_saying =pd.DataFrame({"Author":autors,"Quotes":quotes})
autor_saying.to_csv("quotes.csv",index=False)
print(autor_saying.head())
browser.quit()
我在 csv 文件中有作者和引用的信息,然后按照下面给出的方式阅读它:
import pandas as pd
from bertopic import BERTopic
model =BERTopic()
summarization =[]
data =pd.read_csv("quotes.csv")
print(data.head())
for index, row in data.iterrows():
topics, probs =model.fit_transform([row['Quotes']])
print(topics)
这是结果:
Author Quotes
0 Albert Einstein “The world as we have created it is a process ...
1 J.K. Rowling “It is our choices, Harry, that show what we t...
2 Albert Einstein “There are only two ways to live your life. On...
3 Jane Austen “The person, be it gentleman or lady, who has ...
4 Marilyn Monroe “Imperfection is beauty, madness is genius and...
另外我想使用 bertopic 模型来检测给定站点的主题: 主题建模
但是我的代码给了我以下错误:
ValueError: Transform unavailable when model was fit with only a single data sample.
你能帮我解决这个问题吗?如何检测句子中出现的主题?
您应该一次使用所有报价进行训练,而不是逐一进行。所以而不是
for index, row in data.iterrows():
topics, probs =model.fit_transform([row['Quotes']])
print(topics)
尝试
topics, probs = model.fit_transform(data['Quotes'].tolist())
data['Topic'] = topics
data['Probability'] = probs
print(data.head())