尝试解析特定行的 beautifulsoap 刮擦,但我认为它不会根据类或 rowid 进行解析 - 例如。我想要 rowid=2 的文本

问题描述 投票:0回答:1

我正在尝试获取 rowid=2 的信息(文本)(这应该会显着缩小内容范围,但我不断获取 td(s) 或 tr(s) 的文本。您能帮忙解决这个问题吗?

import requests
from bs4 import BeautifulSoup
headers_param = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36"}
bist = requests.get("https://www.borsaistanbul.com/tr/endeks", headers=headers_param)
# print(bist.content)
jobs = bist.content
soup = BeautifulSoup(jobs,"html.parser")
all_jobs = soup.find_all("tr",{"rowid":"2"})
# for job in all_jobs:
# print(job)
all_jobs = soup.find_all("tr")
for data in all_jobs:
   print(data.text)
web-scraping beautifulsoup python-requests tags
1个回答
0
投票

您仍在使用

.text
,但第二次选择
all_jobs
,所以我看不到一般问题。

可以清理您的代码,也可以使用

get_text()
stripped_strings
来获得更好的结果:

all_jobs = soup.find_all("tr",{"rowid":"2"})
for data in all_jobs: 
    print(data.get_text(',', strip=True))

all_jobs = soup.find_all("tr",{"rowid":"2"})
for data in all_jobs: 
    print(','.join(data.stripped_strings))

BIST 100,XU100,3.11.2023,7.705,99,0,55,7.721,18,7.581,57,TL,2,55,39,88,01.01.1986,0,01
BIST-KYD DİBS 91 GUN,TD91G,3.11.2023,3.298,44448,-0,89,3.330,14214,3.298,44448,TL,-1,00,9,03,31.12.2001,100
BIST ALTIN,ATKMP,3.11.2023,4.575,76454,-0,52,4.655,55300,4.528,85721,TL,-0,25,9,64,31.12.2004,1000
BIST 100 RK %10 (TOPLAM GETIRI),RK100T10,3.11.2023,2.106,8786,0,19,2.106,8786,2.106,8786,TL,0,78,18,33,31.12.2003,100,0000
TURK LIRASI GECELIK REFERANS FAIZ ORANI,TLREF,3.11.2023,34,1301,-0,11,34,1301,34,1301,TL,1,79,232,51,28.12.2018,0,00
TURK LIRASI GECELIK KATILIM REFERANS GETIRI ORANI,TLREFK,3.11.2023,33,4117,0,13,33,4117,33,4117,TL,0,80,243,73,22.06.2022,0,00
© www.soinside.com 2019 - 2024. All rights reserved.