我正在尝试使用 selenium 来抓取数据,这需要您推动每一轮以显示更多数据,但我对 selenium 非常缺乏经验,并且无法找到要从中抓取数据的元素
我在google collab上使用selenium并通过xpath定位,但它似乎找不到该元素
options = Options()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
driver = gs.Chrome(options=options)
driver.get('https://dropstab.com/coins/centrifuge/fundraising')
button = driver.find_element(by=By.XPATH, value='/html/body/div/div[1]/div/div[2]/main/div/article/div/div/section/div/div[1]/section[1]/div/div[1]/button')
button.click()
作为参考,如果向下滚动,该按钮就是每一轮筹款(A 轮、风险投资轮等)
您可以通过解析页面中嵌入的Json数据,更轻松地获取筹款信息,例如:
import json
import requests
from bs4 import BeautifulSoup
url = "https://dropstab.com/coins/centrifuge/fundraising"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:125.0) Gecko/20100101 Firefox/125.0"
}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
data = json.loads(soup.select_one("#__NEXT_DATA__").text)
fundraising = data["props"]["pageProps"]["coin"]["fundraising"]
# print(json.dumps(fundraising, indent=4))
for s in fundraising["sales"]:
print(s["name"], s["raised"])
# ... print other info here
print()
打印:
Series A 15000000
Venture Round 4000000
Funding Round 3000000
Community Grants None
Early Ecosystem None
Rewards & Grants None
Core Contributors None
Total Backers None
Foundation Endowment None
Development Grants 1800000
Venture Round 4300000
Strategic Round 3700000
Seed 3800000
Main Sale Option 2 8882500
Main Sale Option 1 9350000