大家好👋
我正在尝试创建一个 discord 机器人,它从大学的活动页面中提取所有活动数据并将其发送到 discord 频道(使用 Python)。
这是我要抓取的 URL: https://www.mmu.ac.uk/student-life/events/
我已经能够抓取它(使用请求和 bs4),但只能抓取初始页面加载时加载的数据。在那之后,我对如何获得感到困惑。
如何抓取单击“查看更多事件”按钮后显示的数据?
我试图查看开发人员工具中的网络选项卡,但我正在努力寻找任何有用的东西。
对不起,如果这是一个新手问题 - 我是一名新的 CS 学生,所以任何支持/建议或资源将不胜感激!
谢谢!! 🙏
到目前为止的代码:
from discord.ext import commands
import discord
import requests
from bs4 import BeautifulSoup
# Instantiating the bot
BOT_TOKEN = 'ABCDE'
CHANNEL_ID = 12345
bot = commands.Bot(command_prefix='!', intents=discord.Intents.all())
#event handler that should be called when the bot is ready and connected to Discord.
@bot.event
async def on_ready():
print("Bot is connected, online and working.")
channel = bot.get_channel(CHANNEL_ID)
await channel.send("Hello! Work is underway on my project bot!")
#command that responds to '!hello'
@bot.command()
async def hello(ctx):
await ctx.send("Hello!")
#command that takes in any number of integers and returns the result
@bot.command()
async def add(ctx, *arr):
result = 0
for i in arr:
result += int(i)
await ctx.send(f"Result: {result}")
# This command sends a list of upcoming events at Manchester Metropolitan University to the channel.
# It fetches the events page of the MMU website, scrapes the relevant info (Title, date & location) using Beautiful Soup,
# and formats it into a string before sending it as a message.
@bot.command()
async def scrape(ctx):
url = 'https://www.mmu.ac.uk/student-life/events/'
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36',
'Accept-Language': 'en-US,en;q=0.9',
'Referer': 'https://www.google.com/',
'Connection': 'keep-alive'
}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
event_list = soup.find('ul', class_='listing--events')
event_items = event_list.find_all('li', class_='event-list__item')
events = []
for item in event_items:
event_name = item.find('h3', class_='event-list__title').text.strip()
event_date = item.find('p', class_='event-list__date-start').text.strip()
event_location = item.find('p', class_='event-list__location').text.strip()
events.append(f"{event_date}\n{event_name}\n{event_location}")
await ctx.send("\n\n".join(events))
bot.run(BOT_TOKEN)
我试过检查开发人员工具上的网络选项卡,但找不到任何可以引导我走向有用方向的东西。