用 Python 编写的应在 Telegram 中运行的聊天机器人不会搜索 IEEE Spectrum 网站上的文章

问题描述 投票:0回答:1

我需要创建一个机器人,在用户输入关键字后在 IEEE Spectrum 网站上搜索文章。该机器人必须在 Telegram 中运行。但在搜索文章时,机器人总是给我

No results were found for your request.
。虽然网站上有文章,但我还是查了一下。为什么机器人无法正常工作?

import telegram
from telegram.ext import Updater, CommandHandler
import requests
from bs4 import BeautifulSoup

# a function that will be enabled when a command is received
def start(update, context):
    update.message.reply_text(
        "Hello! I'll help you find articles on the IEEE Spectrum website."
        'Just write /search and the search keywords after that.')

# a function that will turn on when you receive a text message
def search(update, context):
    query = " ".join(context.args)
    if query == "":
        update.message.reply_text('To search, you must enter keywords after the /search command')
        return

    # the site where we will search for articles
    url = 'https://spectrum.ieee.org'
    # request a site using keywords
    response = requests.get(url+'/search?keywords=' + query)

    if response.status_code == 200:
        # parsing html page using BeautifulSoup
        soup = BeautifulSoup(response.content, 'html.parser')

        # looking for articles on the search results page
        articles = soup.select('.search-result')
        if len(articles) > 0:
            for article in articles:
                title = article.select_one('.search-result-title a').text
                href = article.select_one('.search-result-title a')['href']
                message = f'{title}\n{url}{href}'
                update.message.reply_text(message)
        else:
            update.message.reply_text('No results were found for your request.')
    else:
        update.message.reply_text('Error when requesting IEEE Spectrum site.')

# creating a bot and connecting to the Telegram API
bot_token = '6437672171:AAGVvRu4UNg2eR3ZinB7Ovd0NUk9ctNAVo8'
updater = Updater(token=bot_token, use_context=True)
dispatcher = updater.dispatcher

# adding command and text message handlers
start_handler = CommandHandler('start', start)
search_handler = CommandHandler('search', search)
dispatcher.add_handler(start_handler)
dispatcher.add_handler(search_handler)

# launch a bot
updater.start_polling()
updater.idle()

我尝试做某事,但没有成功

python telegram-bot chatbot ieee
1个回答
1
投票

问题在于该网站正在使用 JavaScript。

requests
仅适用于静态网页,不适用于本网站。您可以使用
curl
来验证这一点:
curl -L https://spectrum.ieee.org/search/\?q\=aerospace
。您可以看到响应包含 JavaScript,
request
无法使用它。

相反,您可能希望将无头 Web 驱动程序与 Selenium 结合使用。 Selenium 生成一个实际的浏览器实例,因此 JavaScript 将运行,并且将加载搜索结果。

你的程序的总体流程应该保持不变,你只需要更改代码的网络抓取部分。

您可以通过其文档了解有关 Selenium 的更多信息。

© www.soinside.com 2019 - 2024. All rights reserved.