如何对大型HTML内容进行循环分组:

问题描述 投票:0回答:1

我编写了此算法以下代码

from bs4 import BeautifulSoup
import requests
import time

data = [{"operationName": "SearchQuery", "variables": {"query": "Pedro Alvares", "after": None, "first": 10},
         "query": "query SearchQuery($query: String!, $first: Int!, $after: ID) {\n  questionSearch(query: $query, first: $first, after: $after) {\n    count\n    edges {\n      node {\n        id\n        databaseId\n        author {\n          id\n          databaseId\n          isDeleted\n          nick\n          avatar {\n            thumbnailUrl\n            __typename\n          }\n          rank {\n            name\n            __typename\n          }\n          __typename\n        }\n        content\n        answers {\n          nodes {\n            thanksCount\n            ratesCount\n            rating\n            __typename\n          }\n          hasVerified\n          __typename\n        }\n        __typename\n      }\n      highlight {\n        contentFragments\n        __typename\n      }\n      __typename\n    }\n    __typename\n  }\n}\n"}]
r = requests.post("https://brainly.com.br/graphql/pt", json=data).json()

p=[]    
for item in r[0]['data']['questionSearch']['edges']:
    rst=(f"https://brainly.com.br/tarefa/{item['node']['databaseId']}")
    p.append(rst)

for ele in p: 

而且我想打印每个链接的HTML,我想这样做:

for ele in p: 
    r = requests.get(p).text 
    time.sleep(5) 
    print(r)

我的问题以及是否有改进此循环的方法。然后,我将过滤这些HTML

python loops
1个回答
0
投票

首先,在第二行中用ele替换p然后您无需设置time.sleep(5),因为它直到请求完成才开始运行,并且此延迟没有用。

© www.soinside.com 2019 - 2024. All rights reserved.