我需要创建一个机器人,在用户输入关键字后在 IEEE Spectrum 网站上搜索文章。该机器人必须在 Telegram 中运行。但在搜索文章时,机器人总是给我
No results were found for your request.
。虽然网站上有文章,但我还是查了一下。为什么机器人无法正常工作?
import telegram
from telegram.ext import Updater, CommandHandler
import requests
from bs4 import BeautifulSoup
# a function that will be enabled when a command is received
def start(update, context):
update.message.reply_text(
"Hello! I'll help you find articles on the IEEE Spectrum website."
'Just write /search and the search keywords after that.')
# a function that will turn on when you receive a text message
def search(update, context):
query = " ".join(context.args)
if query == "":
update.message.reply_text('To search, you must enter keywords after the /search command')
return
# the site where we will search for articles
url = 'https://spectrum.ieee.org'
# request a site using keywords
response = requests.get(url+'/search?keywords=' + query)
if response.status_code == 200:
# parsing html page using BeautifulSoup
soup = BeautifulSoup(response.content, 'html.parser')
# looking for articles on the search results page
articles = soup.select('.search-result')
if len(articles) > 0:
for article in articles:
title = article.select_one('.search-result-title a').text
href = article.select_one('.search-result-title a')['href']
message = f'{title}\n{url}{href}'
update.message.reply_text(message)
else:
update.message.reply_text('No results were found for your request.')
else:
update.message.reply_text('Error when requesting IEEE Spectrum site.')
# creating a bot and connecting to the Telegram API
bot_token = '6437672171:AAGVvRu4UNg2eR3ZinB7Ovd0NUk9ctNAVo8'
updater = Updater(token=bot_token, use_context=True)
dispatcher = updater.dispatcher
# adding command and text message handlers
start_handler = CommandHandler('start', start)
search_handler = CommandHandler('search', search)
dispatcher.add_handler(start_handler)
dispatcher.add_handler(search_handler)
# launch a bot
updater.start_polling()
updater.idle()
我尝试做某事,但没有成功
问题在于该网站正在使用 JavaScript。
requests
仅适用于静态网页,不适用于本网站。您可以使用 curl
来验证这一点:curl -L https://spectrum.ieee.org/search/\?q\=aerospace
。您可以看到响应包含 JavaScript,request
无法使用它。
相反,您可能希望将无头 Web 驱动程序与 Selenium 结合使用。 Selenium 生成一个实际的浏览器实例,因此 JavaScript 将运行,并且将加载搜索结果。
你的程序的总体流程应该保持不变,你只需要更改代码的网络抓取部分。
您可以通过其文档了解有关 Selenium 的更多信息。