使用 Python Scrapy 在 Linkedin Scraper 中阻止 Ip

Question

我是一名研究 scrapy 框架的学生，试图抓取 linkedin 配置文件连接，但我被阻止了，我已经集成了 zyte smarrtproxy 并收到 523 错误。请帮我绕过这个

我怎样才能抓取 linkedin 个人资料连接数据？

我的代码：

import scrapy
from linkedinprofile.loginlinkedin import loginSitesHandler

from scrapy_splash import SplashRequest 
from scrapy.http import FormRequest

class profile_connectionsSpider(scrapy.Spider):
    name = "profile_connections"
 

def start_requests(self):
    profile_list = [
        'https://www.linkedin.com/home',
        'https://www.linkedin.com/in/darsh-turakhia-011000195/'
        ]

    for profile in profile_list:
        yield scrapy.Request(url=profile, callback=self.parse)

def parse(self, response):
    with open('response.html', 'wb') as f:
        f.write(response.body)
    print(response.xpath('//*[@id="ember255"]/div[2]/div[2]/div[1]/div[1]/h1').get())

输出：域被阻止。

使用 Python Scrapy 在 Linkedin Scraper 中阻止 Ip

问题描述投票：0回答：0

最新问题

使用 Python Scrapy 在 Linkedin Scraper 中阻止 Ip

问题描述 投票：0回答：0

最新问题

问题描述投票：0回答：0