我正在尝试从该网站的 CSV 文件中提取参与者的姓名和完成的模块数量 - https://learn.microsoft.com/training/challenges?id=f66f0d57-d644-44d1-9faf- 112b18a0ef92
下面是我的代码,前几天我已经写成功了,但是后来我尝试修改它,现在没有任何效果。
from selenium import webdriver
import csv
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
# Initialize the web driver
driver = webdriver.Chrome()
# URL of the website
url = "https://learn.microsoft.com/training/challenges?id=f66f0d57-d644-44d1-9faf-112b18a0ef92"
driver.get(url)
# Locate participant names and modules completed elements
participant_names = driver.find_element(By.CSS_SELECTOR, ".is-hidden-mobile.leaderboard-name")
modules_completed = driver.find_element(By.CSS_SELECTOR, "span")
# Extract data and store it in a list
data = []
for name, modules in zip(participant_names, modules_completed):
data.append([name.text, modules.text])
# Define the CSV file name
csv_file_name = 'participants.csv'
# Write data to a CSV file
with open(csv_file_name, 'w', newline='') as csvfile:
csvwriter = csv.writer(csvfile)
csvwriter.writerow(['Participant Name', 'Modules Completed']) # Write header
csvwriter.writerows(data)
# Close the browser
driver.quit()
print(f"Data has been scraped and saved to {csv_file_name}.")
这是名称的检查元素代码:
<span class="is-hidden-mobile leaderboard-name"><!---->Abhishek Kumar<!----></span>
对于已完成的模块:
<span><!---->12/12<!----></span>
还有一个分页,我试图包含,但由于错误,我删除了它,但基本的第一页代码仍然无法工作。 页面代码:
<button type="button" class="pagination-link is-current" data-page="1" aria-label="Page 1 of 4" aria-current="true">
1
</button>
如果有任何代码或指示可以毫无错误地完成此操作,我将非常感激。
我遇到的错误如下
Traceback (most recent call last):
File "C:\Users\misss\AppData\Local\Programs\Python\Python311\wspmlsa.py", line 17, in <module>
participant_names = driver.find_element(By.CSS_SELECTOR, ".is-hidden-mobile.leaderboard-name")
File "C:\Users\misss\AppData\Local\Programs\Python\Python311\Lib\site-packages\selenium\webdriver\remote\webdriver.py", line 738, in find_element
return self.execute(Command.FIND_ELEMENT, {"using": by, "value": value})["value"]
File "C:\Users\misss\AppData\Local\Programs\Python\Python311\Lib\site-packages\selenium\webdriver\remote\webdriver.py", line 344, in execute
self.error_handler.check_response(response)
File "C:\Users\misss\AppData\Local\Programs\Python\Python311\Lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 229, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".is-hidden-mobile.leaderboard-name"}
(Session info: chrome=117.0.5938.134); For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors#no-such-element-exception
Stacktrace:
GetHandleVerifier [0x00007FF65ADD7D12+55474]
(No symbol) [0x00007FF65AD477C2]
(No symbol) [0x00007FF65ABFE0EB]
(No symbol) [0x00007FF65AC3EBAC]
(No symbol) [0x00007FF65AC3ED2C]
(No symbol) [0x00007FF65AC79F77]
(No symbol) [0x00007FF65AC5F19F]
(No symbol) [0x00007FF65AC77EF2]
(No symbol) [0x00007FF65AC5EF33]
(No symbol) [0x00007FF65AC33D41]
(No symbol) [0x00007FF65AC34F84]
GetHandleVerifier [0x00007FF65B13B762+3609346]
GetHandleVerifier [0x00007FF65B191A80+3962400]
GetHandleVerifier [0x00007FF65B189F0F+3930799]
GetHandleVerifier [0x00007FF65AE73CA6+694342]
(No symbol) [0x00007FF65AD52218]
(No symbol) [0x00007FF65AD4E484]
(No symbol) [0x00007FF65AD4E5B2]
(No symbol) [0x00007FF65AD3EE13]
BaseThreadInitThunk [0x00007FFE68417344+20]
RtlUserThreadStart [0x00007FFE6A2226B1+33]
数据正在通过 API 合并到页面中。为什么不直接抓取该 API 端点呢?您可以在浏览器的开发工具 --> 网络选项卡中找到它。
这是一种方法:
import requests
import pandas as pd
headers= {
'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36'
}
r = requests.get("https://learn.microsoft.com/api/challenges/f66f0d57-d644-44d1-9faf-112b18a0ef92/leaderboard?$top=1000&$skip=0&locale=en-gb", headers=headers)
df = pd.json_normalize(r.json(), record_path=['results'])
print(df[['rank', 'score', 'userDisplayName']])
终端结果:
rank score userDisplayName
0 1 12.0 _11KRISHNA VAMSI
1 2 12.0 ABDUL SAMAD KHAN
2 3 12.0 Abhishek Kumar
3 4 12.0 Aditya Srivastav
4 5 12.0 Akshay Gupta
5 6 12.0 Harshvardhan Nayakal
6 7 12.0 Khushi
7 8 12.0 Kreeti Jindal
8 9 12.0 Md Ibrahim Noman
9 10 12.0 MUKESH PAL
10 11 12.0 Patel Harsh Satishkumar
11 12 12.0 Prashant Dwivedi
12 13 12.0 Rudraraju Sriya
13 14 12.0 Sagar Chintamani
14 15 12.0 Samvarthika . C
15 16 12.0 Sandeep Kumar Patel
16 17 12.0 Shashank Kumar Srivastava
17 18 12.0 Shivanshu_Nigam
18 19 12.0 Smriti Tiwari
19 20 12.0 udit kumar singh
20 21 12.0 Viraj Bhutada
21 22 7.0 Akanksha Pal
22 23 2.0 Md Tawsif Mahmud Toha
23 24 0.0 Asritha mudunuri
24 25 0.0 Chandramani kumari
25 26 0.0 Kandukuri Jaswanth
26 27 0.0 Lilanjan Barman
27 28 0.0 Nelissa
28 29 0.0 Paresh Maheshwari
29 30 0.0 prachothan reddy kuthuru
30 31 0.0 Radhika Garg
31 32 0.0 Roopesh Ranjan
32 33 0.0 Roopesh Ranjan
33 34 0.0 sayyid hassan shaabani
34 35 0.0 SEELAM ALEXANDER
35 36 0.0 Tamilarasan S
36 37 0.0 Vashu Agarwal
您可以探索该 json 响应,也许查看包含的其他数据。
下面的代码对于页面提取效果很好,不确定所有其他页面!
from selenium import webdriver
import csv
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# Initialize the web driver (choose the appropriate driver for your browser)
driver = webdriver.Chrome() # Use Chrome or other browsers you prefer
# URL of the website
url = "https://learn.microsoft.com/en-us/training/challenges?id=f66f0d57-d644-44d1-9faf-112b18a0ef92&wt.mc_id=studentamb_106710"
driver.get(url)
WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "span.is-hidden-mobile.leaderboard-name")))
# Locate participant names and modules completed elements
participant_names = driver.find_elements("css selector", "span.is-hidden-mobile.leaderboard-name")
modules_completed = driver.find_elements("css selector", "div.level-right.leaderboard-score-over-total span")
# Extract data and store it in a list
data = []
for name, modules in zip(participant_names, modules_completed):
data.append([name.text, modules.text])
# Define the CSV file name
csv_file_name = 'participants.csv'
# Write data to a CSV file
with open(csv_file_name, 'w', newline='') as csvfile:
csvwriter = csv.writer(csvfile)
csvwriter.writerow(['Participant Name', 'Modules Completed']) # Write header
csvwriter.writerows(data)
# Close the browser
driver.quit()
print(f"Data has been scraped and saved to {csv_file_name}.")
分享了这个,所以如果您遇到基本代码的问题,这可能会有所帮助!