使用BeautifulSoup从链接中提取标题

问题描述 投票:-3回答:2

我正在使用beautifulsoup抓取一个网站,但在此方面需要帮助,因为我是python和beautifulsoup的新手我如何从以下途径获得VET“ [[VET]]”

到目前为止,这是我的代码

import bs4 as bs
import urllib.request
import pandas as pd


#This is the Home page of the website
source = urllib.request.urlopen('file:///C:/Users/Aiden/Downloads/stocks/Stock%20Premarket%20Trading%20Activity%20_%20Biggest%20Movers%20Before%20the%20Market%20Opens.html').read().decode('utf-8')

soup = bs.BeautifulSoup(source,'lxml')


#find the Div and put all info into varTable
table = soup.find('table',{"id":"decliners_tbl"}).tbody



#find all Rows in table and puts into varTableRows
tableRows = table.find_all('tr')
print ("There is ",len(tableRows),"Rows in the Table")
print(tableRows)

columns = [tableRows[1].find_all('td')]
print(columns)

a = [tableRows[1].find_all("a")]
print(a)

So my output from print(a) is "[[<a class="mplink popup_link" href="https://marketchameleon.com/Overview/VET/">VET</a>]]"
 and I want to extract VET out 

AD

python screen-scraping
2个回答
0
投票

您可以使用a.text或a.get_text()。

如果您有多个元素,则需要对此功能进行列表理解


0
投票

感谢您的所有答复,我能够使用以下代码来解决此问题

source = urllib.request.urlopen('file:///C:/Users/Aiden/Downloads/stocks/Stock%20Premarket%20Trading%20Activity%20_%20Biggest%20Movers%20Before%20the%20Market%20Opens.html').read().decode('utf-8')


soup = bs.BeautifulSoup(source,'html.parser')

table = soup.find("table",id="decliners_tbl")

for decliners in table.find_all("tbody"):
    rows = decliners.find_all("tr")
    for row in rows:
        ticker = row.find("a").text
        volume = row.findAll("td", class_="rightcell")[3].text
        print(ticker, volume)
© www.soinside.com 2019 - 2024. All rights reserved.