Linkedin 的网页抓取

问题描述 投票:0回答:2

我目前正在使用 selenium 为 Linkedin Web Scraping 开展一个大学项目。以下是相同的代码:

from selenium import webdriver
from time import sleep
from selenium.webdriver.common.keys import Keys
from parsel import Selector

driver = webdriver.Chrome('location of web driver')
driver.get('https://www.linkedin.com')

# username
username = driver.find_element_by_id('session_key')
username.send_keys('Linkedin Username')
sleep(0.5)

# password
password = driver.find_element_by_id('session_password')
password.send_keys('Linkedin Password')
sleep(0.5)

#submit value
sign_in_button = driver.find_element_by_xpath('//*[@type="submit"]')
sign_in_button.click()
sleep(0.5)

driver.get('https://www.google.com/')   #Navigate to google to search the profile

# locate search form by_name
search_query = driver.find_element_by_name('q')

# send_keys() to simulate the search text key strokes
search_query.send_keys('https://www.linkedin.com/in/khushi-thakkar-906b56188/')
sleep(0.5)

search_query.send_keys(Keys.RETURN)
sleep(3)

# locate the first link
search_person = driver.find_element_by_class_name('yuRUbf')
search_person.click()

#Experience
experience = driver.find_elements_by_css_selector('#experience-section .pv-profile-section')
for item in experience:
    print(item.text)
    print("")

#Education
education = driver.find_elements_by_css_selector('#education-section .pv-profile-section')
for item in education:
    print(item.text)
    print("")

#Certification
certification = driver.find_elements_by_css_selector('#certifications-section .pv-profile-section')
for item in certification:
    print(item.text)
    print("")

当我抓取体验部分时,它完美地提取了信息。但是当我对教育和认证部分做同样的事情时 - 它显示一个空列表。请帮忙!

selenium selenium-webdriver web-scraping webdriver linkedin-api
2个回答
0
投票

我认为问题是因为你的CSS选择器。我自己尝试了一下,它无法在 html 主体上找到任何元素

修复你的CSS选择器,你就会没事的

#Education
education = driver.find_elements_by_css_selector('#education-section li')

#Certification
certification = driver.find_elements_by_css_selector('#certifications-section li')

0
投票

我已经在linkedin报废工作了几个星期了,我找到所有信息的方式如下,你必须考虑html是如何构建的,分析ul和li:

target_element = soup.find('div',id='experience').findNext('div',class_='pvslist__outer-container')

lists = target_element.findAll('li',class_='artdeco-list__item')

分析一下id在你想要scarp的html的不同部分是如何显示的,希望这能有所帮助

© www.soinside.com 2019 - 2024. All rights reserved.