试图将数据从html提取到数据框中
摘自这段html
<table width="100%" cellpadding="0" cellspacing="0"><tr><td><table width="100%" cellpadding="1" cellspacing="0" border="0" id="news-table" class="fullview-news-outer">
<tr><td width="130" align="right" style="white-space:nowrap">Apr-22-20 01:30AM </td><td align="left"><a href="https://finance.yahoo.com/news/stmicro-sees-declining-demand-automotive-053033014.html" target="_blank" class="tab-link-news">STMicro Sees Declining Demand for Automotive Chips Next Quarter</a> <span style="color:#aa6dc0;font-size:9px">Bloomberg</span></td></tr>
<tr><td width="130" align="right" style="white-space:nowrap">Apr-21-20 10:43PM </td><td align="left"><a href="https://www.investors.com/market-trend/stock-market-today/dow-jones-futures-crude-oil-prices-test-coronavirus-stock-market-rally-netflix-snap-chipotle-earnings/?src=A00220" target="_blank" class="tab-link-news">Dow Jones Futures: Crashing Crude Oil Prices Test Coronavirus Stock Market Rally; 5 Big Earnings Movers</a> <span style="color:#aa6dc0;font-size:9px">Investor's Business Daily</span></td></tr>
<tr><td width="130" align="right">09:31PM </td><td align="left"><a href="https://finance.yahoo.com/news/facebook-plow-5-7-billion-005209259.html" target="_blank" class="tab-link-news">Facebook to Invest $5.7 Billion in Ambanis Jio Platforms</a> <span style="color:#aa6dc0;font-size:9px">Bloomberg</span></td></tr>
<tr><td width="130" align="right">08:00PM </td><td align="left"><a href="https://finance.yahoo.com/news/plastic-bags-making-comeback-last-000001077.html" target="_blank" class="tab-link-news">Plastic Bags Are Making a Comeback. Will It Last?</a> <span style="color:#aa6dc0;font-size:9px">Bloomberg</span></td></tr>
<tr><td width="130" align="right">07:27PM </td><td align="left"><a href="https://finance.yahoo.com/news/rpt-bluetooth-phone-apps-tracking-232727649.html" target="_blank" class="tab-link-news">RPT-Bluetooth phone apps for tracking COVID-19 show modest early results</a> <span style="color:#aa6dc0;font-size:9px">Reuters</span></td></tr>
<tr><td width="130" align="right" style="white-space:nowrap">Apr-20-20 09:00PM </td><td align="left"><a href="https://finance.yahoo.com/news/jerremy-newsome-shares-rules-options-010014004.html" target="_blank" class="tab-link-news">Jerremy Newsome Shares The Rules For His Options Strategy</a> <span style="color:#aa6dc0;font-size:9px">Benzinga</span></td></tr>
</table>
20-22年4月22日上午:意法半导体(STMicro)下个季度对汽车芯片的需求下降20 Apr-21-20 10:43 PM道琼斯期货:原油价格暴跌考验了冠状病毒股市的上涨; 5大收益推动者《投资者日报》09:31 PM Facebook将向Ambanis Jio平台彭博社投资57亿美元08:00 PM塑料袋卷土重来。会持续吗?彭博社07:27 PM用于跟踪COVID-19的RPT蓝牙电话应用显示了适度的早期结果Apr-20-20 09:00 PM杰里米·纽索斯(Jerremy Newsome)分享了他的期权策略《奔驰》的规则'''
ticker = 'AAPL'
NEWS_URL = 'https://finviz.com/news.ashx'
STOCK_URL = 'https://finviz.com/quote.ashx'
page_parsed, _ = http_request_get(url=STOCK_URL, payload={'t': ticker}, parse=True)
table = page_parsed.cssselect('table[class="fullview-news-outer"]')[0]
all_news = page_parsed.cssselect('a[class="tab-link-news"]')
headers = ['Datetime', 'Description', 'Space', 'Source']
urls = [row.get('href') for row in all_news]
data = [dict(zip(headers, row.xpath('td//text()'))) for row in table[0:]]
df1 = pd.DataFrame(urls)
df2 = pd.DataFrame(data)
mergedDf = df2.merge(df1, left_index=True, right_index=True)
mergedDf = mergedDf.rename(columns={0: "url"})
mergedDf = mergedDf.drop(['Space'], axis=1)
mergedDf['ticker'] = ticker