需要帮助将html解析为python数据帧

问题描述 投票:1回答:1

试图将数据从html提取到数据框中

Table Image here

摘自这段html

<table width="100%" cellpadding="0" cellspacing="0"><tr><td><table width="100%" cellpadding="1" cellspacing="0" border="0" id="news-table" class="fullview-news-outer">
<tr><td width="130" align="right" style="white-space:nowrap">Apr-22-20 01:30AM&nbsp;&nbsp;</td><td align="left"><a href="https://finance.yahoo.com/news/stmicro-sees-declining-demand-automotive-053033014.html" target="_blank" class="tab-link-news">STMicro Sees Declining Demand for Automotive Chips Next Quarter</a> <span style="color:#aa6dc0;font-size:9px">Bloomberg</span></td></tr>
<tr><td width="130" align="right" style="white-space:nowrap">Apr-21-20 10:43PM&nbsp;&nbsp;</td><td align="left"><a href="https://www.investors.com/market-trend/stock-market-today/dow-jones-futures-crude-oil-prices-test-coronavirus-stock-market-rally-netflix-snap-chipotle-earnings/?src=A00220" target="_blank" class="tab-link-news">Dow Jones Futures: Crashing Crude Oil Prices Test Coronavirus Stock Market Rally; 5 Big Earnings Movers</a> <span style="color:#aa6dc0;font-size:9px">Investor's Business Daily</span></td></tr>
<tr><td width="130" align="right">09:31PM&nbsp;&nbsp;</td><td align="left"><a href="https://finance.yahoo.com/news/facebook-plow-5-7-billion-005209259.html" target="_blank" class="tab-link-news">Facebook to Invest $5.7 Billion in Ambanis Jio Platforms</a> <span style="color:#aa6dc0;font-size:9px">Bloomberg</span></td></tr>
<tr><td width="130" align="right">08:00PM&nbsp;&nbsp;</td><td align="left"><a href="https://finance.yahoo.com/news/plastic-bags-making-comeback-last-000001077.html" target="_blank" class="tab-link-news">Plastic Bags Are Making a Comeback. Will It Last?</a> <span style="color:#aa6dc0;font-size:9px">Bloomberg</span></td></tr>
<tr><td width="130" align="right">07:27PM&nbsp;&nbsp;</td><td align="left"><a href="https://finance.yahoo.com/news/rpt-bluetooth-phone-apps-tracking-232727649.html" target="_blank" class="tab-link-news">RPT-Bluetooth phone apps for tracking COVID-19 show modest early results</a> <span style="color:#aa6dc0;font-size:9px">Reuters</span></td></tr>
<tr><td width="130" align="right" style="white-space:nowrap">Apr-20-20 09:00PM&nbsp;&nbsp;</td><td align="left"><a href="https://finance.yahoo.com/news/jerremy-newsome-shares-rules-options-010014004.html" target="_blank" class="tab-link-news">Jerremy Newsome Shares The Rules For His Options Strategy</a> <span style="color:#aa6dc0;font-size:9px">Benzinga</span></td></tr>
</table>

20-22年4月22日上午:意法半导体(STMicro)下个季度对汽车芯片的需求下降20 Apr-21-20 10:43 PM道琼斯期货:原油价格暴跌考验了冠状病毒股市的上涨; 5大收益推动者《投资者日报》09:31 PM Facebook将向Ambanis Jio平台彭博社投资57亿美元08:00 PM塑料袋卷土重来。会持续吗?彭博社07:27 PM用于跟踪COVID-19的RPT蓝牙电话应用显示了适度的早期结果Apr-20-20 09:00 PM杰里米·纽索斯(Jerremy Newsome)分享了他的期权策略《奔驰》的规则'''

python dataframe finance
1个回答
0
投票
ticker = 'AAPL' NEWS_URL = 'https://finviz.com/news.ashx' STOCK_URL = 'https://finviz.com/quote.ashx' page_parsed, _ = http_request_get(url=STOCK_URL, payload={'t': ticker}, parse=True) table = page_parsed.cssselect('table[class="fullview-news-outer"]')[0] all_news = page_parsed.cssselect('a[class="tab-link-news"]') headers = ['Datetime', 'Description', 'Space', 'Source'] urls = [row.get('href') for row in all_news] data = [dict(zip(headers, row.xpath('td//text()'))) for row in table[0:]] df1 = pd.DataFrame(urls) df2 = pd.DataFrame(data) mergedDf = df2.merge(df1, left_index=True, right_index=True) mergedDf = mergedDf.rename(columns={0: "url"}) mergedDf = mergedDf.drop(['Space'], axis=1) mergedDf['ticker'] = ticker
© www.soinside.com 2019 - 2024. All rights reserved.