使用 Python Scrapy 在 Linkedin Scraper 中阻止 Ip

问题描述 投票:0回答:0

我是一名研究 scrapy 框架的学生,试图抓取 linkedin 配置文件连接,但我被阻止了,我已经集成了 zyte smarrtproxy 并收到 523 错误。请帮我绕过这个

我怎样才能抓取 linkedin 个人资料连接数据?

我的代码:

import scrapy
from linkedinprofile.loginlinkedin import loginSitesHandler

from scrapy_splash import SplashRequest 
from scrapy.http import FormRequest

class profile_connectionsSpider(scrapy.Spider):
    name = "profile_connections"
 

def start_requests(self):
    profile_list = [
        'https://www.linkedin.com/home',
        'https://www.linkedin.com/in/darsh-turakhia-011000195/'
        ]

    for profile in profile_list:
        yield scrapy.Request(url=profile, callback=self.parse)

def parse(self, response):
    with open('response.html', 'wb') as f:
        f.write(response.body)
    print(response.xpath('//*[@id="ember255"]/div[2]/div[2]/div[1]/div[1]/h1').get())

输出:域被阻止。Here you can see what 523 error means

python scrapy linkedin
© www.soinside.com 2019 - 2024. All rights reserved.