我正在尝试创建一个可以同时喜欢具有多个个人资料的帖子的应用程序。我正在尝试并行化登录名,并为每个登录的用户按如下所示并行地发布帖子。
with ProcessPoolExecutor() as exe:
bot = Insta()
results = []
for credential in credentials: # go through credentials, login in parallel.
results.append(
exe.submit(bot.login, credential) # Each login takes 15 sec
) # Add status of each process in a list
for result in as_completed(results): # When complete, I call like
if result.result() == 200:
with Pool(4) as p:
resp = p.map(bot.like, urls)
print(resp)
尽管向我返回status_code 200,但当我查看该帖子时,它并不受欢迎。
[当我尝试针对每个登录单独执行此操作时,它会向我返回相同的内容,但是这次,该帖子非常受欢迎。那是:
bot = Insta()
resp = bot.login(credential)
if resp == 200:
with Pool(5) as p:
p.map(bot.like, urls)
谁能告诉我问题是什么?我想知道我做错了什么。我喜欢的方法现在看起来像这样:
def like(self, url_post):
self._set_id_post(url_post) # id of post
resp = self.session.get(url_post)
self.session.headers = {'user-agent': self.user_agent}
self.session.headers.update({'Referer': url_post})
self.session.headers.update({'X-CSRFToken': resp.cookies['csrftoken']}, )
url = endpoints['like_url'] % self.post_id
time.sleep(random.gauss(6, 1.5))
response = self.session.post(url)
self.session.headers.update({'X-CSRFToken': resp.cookies['csrftoken']})
if response.status_code == 200:
return response.status_code
elif response.status_code == 403:
return response.status_code
elif response.status_code == 400:
return response.status_code
我通过为每个用户设置代理来解决此问题。我必须购买它,因为在我的案例中公众没有工作。但是对于那些除了社交网络之外,Web Scraping遇到类似问题的人,我将在这里插入我的代码之一,以获取免费站点的返回代理精英服务,可能会有所帮助。
def free_proxy_list():
options = Options()
options.headless = True
driver = webdriver.Chrome(options=options)
driver.get('https://free-proxy-list.net/')
# Show only Elite Proxies
driver.find_element_by_xpath('//*[@id="proxylisttable"]/tfoot/tr/th[5]/select/option[3]').click()
# Show only SSl
driver.find_element_by_xpath('//*[@id="proxylisttable"]/tfoot/tr/th[7]/select/option[3]').click()
proxies = []
# Paginate 1
for i in range(1, 21):
xpath_ip = driver.find_element_by_xpath('//*[@id="proxylisttable"]/tbody/tr[%s]/td[1]' % i).text
xpath_port = driver.find_element_by_xpath('//*[@id="proxylisttable"]/tbody/tr[%s]/td[2]' % i).text
proxy = xpath_ip + ":" + xpath_port
proxies.append(proxy)
# Paginate 2
driver.find_element_by_xpath('//*[@id="proxylisttable_paginate"]/ul/li[4]/a').click()
try:
for i in range(1, 21):
xpath_ip = driver.find_element_by_xpath('//*[@id="proxylisttable"]/tbody/tr[%s]/td[1]' % i).text
xpath_port = driver.find_element_by_xpath('//*[@id="proxylisttable"]/tbody/tr[%s]/td[2]' % i).text
proxy = xpath_ip + ":" + xpath_port
proxies.append(proxy)
except NoSuchElementException:
return proxies
return proxies