这个大学网站从哪里提取数据? (Python 网页抓取)

问题描述 投票:0回答:0

大家好👋

我正在尝试创建一个 discord 机器人,它从大学的活动页面中提取所有活动数据并将其发送到 discord 频道(使用 Python)。

这是我要抓取的 URL: https://www.mmu.ac.uk/student-life/events/

我已经能够抓取它(使用请求和 bs4),但只能抓取初始页面加载时加载的数据。在那之后,我对如何获得感到困惑。

如何抓取单击“查看更多事件”按钮后显示的数据?

我试图查看开发人员工具中的网络选项卡,但我正在努力寻找任何有用的东西。

对不起,如果这是一个新手问题 - 我是一名新的 CS 学生,所以任何支持/建议或资源将不胜感激!

谢谢!! 🙏

到目前为止的代码:

from discord.ext import commands
import discord
import requests
from bs4 import BeautifulSoup

# Instantiating the bot
BOT_TOKEN = 'ABCDE'
CHANNEL_ID = 12345
bot = commands.Bot(command_prefix='!', intents=discord.Intents.all())

#event handler that should be called when the bot is ready and connected to Discord.
@bot.event
async def on_ready():
    print("Bot is connected, online and working.")
    channel = bot.get_channel(CHANNEL_ID)
    await channel.send("Hello! Work is underway on my project bot!")

  
#command that responds to '!hello'
@bot.command()
async def hello(ctx):
    await ctx.send("Hello!")


#command that takes in any number of integers and returns the result
@bot.command()
async def add(ctx, *arr):
    result = 0
    for i in arr:
        result += int(i)

    await ctx.send(f"Result: {result}")


# This command sends a list of upcoming events at Manchester Metropolitan University to the channel.
# It fetches the events page of the MMU website, scrapes the relevant info (Title, date & location) using Beautiful Soup,
# and formats it into a string before sending it as a message. 
@bot.command()
async def scrape(ctx):
    url = 'https://www.mmu.ac.uk/student-life/events/'
    headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36',
    'Accept-Language': 'en-US,en;q=0.9',
    'Referer': 'https://www.google.com/',
    'Connection': 'keep-alive'
    }
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'html.parser')

    event_list = soup.find('ul', class_='listing--events')
    event_items = event_list.find_all('li', class_='event-list__item')

    events = []
    for item in event_items:
        event_name = item.find('h3', class_='event-list__title').text.strip()
        event_date = item.find('p', class_='event-list__date-start').text.strip()
        event_location = item.find('p', class_='event-list__location').text.strip()
        events.append(f"{event_date}\n{event_name}\n{event_location}")

    await ctx.send("\n\n".join(events))

bot.run(BOT_TOKEN)

我试过检查开发人员工具上的网络选项卡,但找不到任何可以引导我走向有用方向的东西。

python web-scraping beautifulsoup python-requests endpoint
© www.soinside.com 2019 - 2024. All rights reserved.