Instagram API在一些调用后开始返回加载页面。

问题描述 投票:0回答:1

我使用下面的代码来获取一千个instagram账户的信息,使用asycnio。在最初的请求中,输出是正确的,但是在10-20次调用后,instagram开始返回加载页面的HTML代码。我可能在这里做错了什么?下面是python代码。

import random
import asyncio
from aiohttp import ClientSession
import urllib.request
import  aiohttp
async def fetch(url, session,sem):
    print("------")
    print(url)
    async with session.get(url = url) as response:
        print(await response.text())
        await  response.text()
        # exit()
        if response.status == 200:
            await sem.acquire()
            fname = url[22:]
            fname = fname.split('/')
            fname = fname[0] + '.txt'
            f = open(fname , 'w')
            f.write(str(await response.text()))
            sem.release()

        # return (await response.text())


async def run(url_list):
    tasks = []

    # create instance of Semaphore
    sem = asyncio.Semaphore(2)
    # Create client session that will ensure we dont open new connection
    # per each request.
    async with ClientSession() as session:
        for url in url_list:
            task = asyncio.ensure_future(fetch(url, session,sem))
            tasks.append(task)
        responses = asyncio.gather(*tasks)
        await responses


# making the url list here
url_list = []
file = open('url.txt', 'r')
for url in file:
    url_list.append(url)

print(url_list)
import time
old = time.time()
loop = asyncio.get_event_loop()
future = asyncio.ensure_future(run(url_list))
loop.run_until_complete(future)

print(time.time() - old)

下面是一些来自url.txt文件的URL地址


https://instagram.com/johanna_kre/?__a=1
https://instagram.com/channie_f/?__a=1
https://instagram.com/lilakuh68/?__a=1
https://instagram.com/nataliacallisto/?__a=1
https://instagram.com/edbastian/?__a=1
https://instagram.com/sylvana.h/?__a=1
https://instagram.com/munich_bombon/?__a=1
https://instagram.com/younotus/?__a=1
https://instagram.com/meet.herbert/?__a=1
https://instagram.com/inaaogo/?__a=1
https://instagram.com/dennisaogo/?__a=1
https://instagram.com/mrslight__/?__a=1
https://instagram.com/reneturrek/?__a=1
https://instagram.com/_eeasyyy/?__a=1
https://instagram.com/sentinobln/?__a=1
https://instagram.com/eri.ka_g/?__a=1
python api python-requests instagram aiohttp
1个回答
0
投票

你的semaphore没有按照你的要求限制请求,你应该在请求之前获取它,而不是在处理内容之前。

在你目前的实现中,你发出了100个并发请求(aiohttp的客户端默认限制),但每次只处理两个响应(然而此时从服务器的角度看,请求已经被处理了)。

使用。

async def fetch(url, session,sem):
    print("------")
    print(url)
    await sem.acquire()
    async with session.get(url = url) as response:
        print(await response.text())
        await  response.text()
        ...
    sem.release()
    ...
© www.soinside.com 2019 - 2024. All rights reserved.