我如何从 geeksforgeeks 上抓取我所在大学的排行榜?

问题描述 投票:0回答:1

我一直在尝试通过网络抓取一个名为 GeeksForGeeks 的编码平台的排行榜。

给定的代码应该可以完美工作。但根本不起作用。

import requests
from bs4 import BeautifulSoup


try:

    for page in range(1,3):
        url = 'https://www.geeksforgeeks.org/colleges/lnct-university/students/?page='+str(page)

        r = requests.get(url)

        soup = BeautifulSoup(r.content, 'html.parser')

        # Find all user profile divs
        user_profile_divs = soup.find_all('div', class_='UserCodingProfileCard_userCodingProfileCard__0GQCR')
        
        for user_profile in user_profile_divs:
            # Extract user details
            user_name = user_profile.find('p', class_='UserCodingProfileCard_userCodingProfileCard_dataDiv_data--linkhandle__lZchE').text
            practice_problem = user_profile.find('p', class_='UserCodingProfileCard_userCodingProfileCard_dataDiv_data--value__3A8Kx').text
            coding_score = user_profile.find('p', class_='UserCodingProfileCard_userCodingProfileCard_dataDiv_data--value__3A8Kx').text
            potd_streak = user_profile.find('p', class_='UserCodingProfileCard_userCodingProfileCard_dataDiv_data--value__3A8Kx').text

            # Print the extracted information
            print(f"User Name: {user_name}")
            print(f"Practice Problem: {practice_problem}")
            print(f"Coding Score: {coding_score}")
            print(f"POTD Streak: {potd_streak}")
            print("\n")

except Exception as e:
    print(e)
python web-scraping beautifulsoup
1个回答
0
投票

问题在于,您在页面上看到的数据是从外部 URL 以 Json 形式加载的,因此 看不到它。

要从所有页面获取数据,您可以使用下一个示例:

import pandas as pd
import requests

api_url = "https://practiceapi.geeksforgeeks.org/api/v1/institute/9162/students/stats?page_size=10&page=1"

page, all_data = 1, []
while True:
    print(f"Page {page}...")
    data = requests.get(api_url).json()
    all_data.extend(data["results"])
    if len(all_data) >= data["count"]:
        break
    page += 1

df = pd.DataFrame(all_data)
print(df.head())

打印:

    user_id                handle  coding_score  total_problems_solved  potd_longest_streak
0   5127866        rishav098kumar          2818                    616                  120
1    945492  Abhishek_Kumar_Verma          2540                   1262                    1
2   4592217     anushkasharma2317          2469                    755                  138
3  10388945             sj502pi26          2268                    614                  176
4  10874753           iascode9w7k          2142                    580                  183

...
© www.soinside.com 2019 - 2024. All rights reserved.