Beautifulsoup从最后一个HTML源代码仅返回h1:

问题描述 投票:-1回答:1

我在下面有此算法:

from bs4 import BeautifulSoup
import requests


data = [{"operationName": "SearchQuery", "variables": {"query": "Python", "after": None, "first": 2},
         "query": "query SearchQuery($query: String!, $first: Int!, $after: ID) {\n  questionSearch(query: $query, first: $first, after: $after) {\n    count\n    edges {\n      node {\n        id\n        databaseId\n        author {\n          id\n          databaseId\n          isDeleted\n          nick\n          avatar {\n            thumbnailUrl\n            __typename\n          }\n          rank {\n            name\n            __typename\n          }\n          __typename\n        }\n        content\n        answers {\n          nodes {\n            thanksCount\n            ratesCount\n            rating\n            __typename\n          }\n          hasVerified\n          __typename\n        }\n        __typename\n      }\n      highlight {\n        contentFragments\n        __typename\n      }\n      __typename\n    }\n    __typename\n  }\n}\n"}]
r = requests.post("https://brainly.com.br/graphql/pt", json=data).json()

p=[]    
for item in r[0]['data']['questionSearch']['edges']:
    rst=(f"https://brainly.com.br/tarefa/{item['node']['databaseId']}")
    p.append(rst)

for ele in p: 
    r = requests.get(ele).text 


soup = BeautifulSoup(r,'html.parser')

for n in soup.find_all('div', attrs={'class': 'brn-content-image'}):    
   print(n.find('h1').text) 

而且我需要过滤这2个HTML

<div class="brn-content-image">
<h1 class="sg-text sg-text--large sg-text--regular">
O que é for em python?​
</h1> 

和:

<div class="brn-content-image">
<h1 class="sg-text sg-text--large sg-text--regular">
Linguagem ( Python )<br /><br />a) Quem foi(ram) o(s) criador(es) do python? <br /><br />b) Cite como se declara uma variáveis:<br /><br />c) O que é uma variável?<br /><br />d) O que é uma função?<br /><br />e) para que serve às { } no python?​​
</h1>
</div> 

预期的出口:

1 h1-语言(Python)

a)Quoi foi(ram)o criador(es)做python吗?

b)引用声明性变体:

c)O que umavariável?

d)O queéumafunção?

e)para que服务às{}没有python吗?

用于em python的2 h1 -O queé?

我在同一变量中有2个HTML页面;我只能过滤2 h1的问题,即>> em python的>> O queé?

而且我需要同时打印两个!我在做什么错:

我在下面有此算法:从bs4导入BeautifulSoup导入请求数据= [{“ operationName”:“ SearchQuery”,“ variables”:{“ query”:“ Python”,“ after”:None,“ first”:2 },“ query”:“ ...

python beautifulsoup
1个回答
2
投票

您在循环外使用的[soup变量,这就是为什么您仅获得第二个html值。它应该在循环内。请立即尝试。

© www.soinside.com 2019 - 2024. All rights reserved.