[使用BeautifulSoup4刮除div类信息

问题描述 投票:0回答:1

我正在尝试将所有信息拉到https://www.macrotrends.net/stocks/charts/GM/general-motors/income-statement?freq=Q上的表中。在Google控制台上播放时,我可以使用document.querySelectorAll("[role=row]")收集我想要的所有行。

我的问题是,我试图在Python中收集所有这些信息。使用BeautifulSoup4,我能够收集网页上的所有内容,但信息的结构似乎有所不同。当我尝试通过在线找到的角色或类名收集元素时,什么都没发生。使用Python下载后,其结构看起来完全不同。下面是包含我想要的信息的代码片段(与我在网上找到的信息不太相似):

var originalData = [{"field_name":"<a href='\/stocks\/charts\/GM\/general-motors\/revenue'>Revenue<\/a>","popup_icon":"<div class='ajax-chart' data-tipped-options=\"ajax: {data: { t: 'GM', s: 'revenue', freq: 'Q', statement: 'income-statement' }}\"><i style='font-size:18px; color:#337ab7;' class='fas fa-chart-bar'><\/i><\/span><\/div>","2020-03-31":"32709.00000","2019-12-31":"30826.00000","2019-09-30":"35473.00000","2019-06-30":"36060.00000","2019-03-31":"34878.00000","2018-12-31":"38399.00000","2018-09-30":"35791.00000","2018-06-30":"36760.00000","2018-03-31":"36099.00000","2017-12-31":"37715.00000","2017-09-30":"33623.00000","2017-06-30":"36984.00000","2017-03-31":"37266.00000","2016-12-31":"35647.00000","2016-09-30":"38889.00000","2016-06-30":"37383.00000","2016-03-31":"37265.00000","2015-12-31":"22990.00000","2015-09-30":"38843.00000","2015-06-30":"38180.00000","2015-03-31":"35712.00000","2014-12-31":"39617.00000","2014-09-30":"39255.00000","2014-06-30":"39649.00000","2014-03-31":"37408.00000","2013-12-31":"40485.00000","2013-09-30":"38983.00000","2013-06-30":"39075.00000","2013-03-31":"36884.00000","2012-12-31":"39307.00000","2012-09-30":"37576.00000","2012-06-30":"37614.00000","2012-03-31":"37759.00000","2011-12-31":"37990.00000","2011-09-30":"36719.00000","2011-06-30":"39373.00000","2011-03-31":"36194.00000","2010-12-31":"36900.00000","2010-09-30":"34060.00000","2010-06-30":"33174.00000","2010-03-31":"31476.00000","2009-12-31":"","2008-12-31":""},...

这是第一行“收入”的信息。请原谅我的无知,但这似乎已经是一个变量,所有内容都以JSON格式组织。有没有一种方法可以像其他JSON数据一样收集该变量并对其进行解析?还是有另一种首选的方法来收集此信息?我以前使用过免费的API,但是我发现它们的数据有时可能不可靠,或者它们切换到订阅模型(例如财务模型准备)。任何建议表示赞赏!

python beautifulsoup finance
1个回答
0
投票

您可以使用rejson模块来解码数据:

import re
import json
import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://www.macrotrends.net/stocks/charts/GM/general-motors/income-statement?freq=Q'

data = json.loads(re.search(r'var originalData = (\[(.*)\])', requests.get(url).text).group(1))

for d in data:
    d['field_name'] = BeautifulSoup(d['field_name'], 'html.parser').text
    del d['popup_icon']

df = pd.DataFrame(data)

print(df)

打印:

                             field_name   2020-03-31   2019-12-31   2019-09-30   2019-06-30   2019-03-31   2018-12-31  ...   2011-03-31   2010-12-31   2010-09-30   2010-06-30   2010-03-31 2009-12-31 2008-12-31
0                               Revenue  32709.00000  30826.00000  35473.00000  36060.00000  34878.00000  38399.00000  ...  36194.00000  36900.00000  34060.00000  33174.00000  31476.00000                      
1                    Cost Of Goods Sold  30082.00000  29098.00000  31161.00000  31471.00000  31535.00000  35092.00000  ...  31850.00000  33171.00000  29587.00000  28609.00000  27553.00000                      
2                          Gross Profit   2627.00000   1728.00000   4312.00000   4589.00000   3343.00000   3307.00000  ...   4344.00000   3729.00000   4473.00000   4565.00000   3923.00000                      
3     Research And Development Expenses                                                                                ...                   0.00000                                                             
4                         SG&A Expenses   1970.00000   2282.00000   2008.00000   2102.00000   2099.00000   2478.00000  ...   2994.00000   3432.00000   2710.00000   2623.00000   2684.00000                      
5    Other Operating Income Or Expenses                                                                                ...     -6.00000     -3.00000    -30.00000    -39.00000    -46.00000                      
6                    Operating Expenses  32052.00000  31380.00000  33169.00000  33573.00000  33634.00000  37570.00000  ...  35245.00000  36603.00000  32327.00000  31271.00000  30283.00000                      
7                      Operating Income    657.00000   -554.00000   2304.00000   2487.00000   1244.00000    829.00000  ...    949.00000    297.00000   1733.00000   1903.00000   1193.00000                      
8    Total Non-Operating Income/Expense    118.00000   -655.00000    278.00000    440.00000    624.00000   -886.00000  ...    455.00000    747.00000    114.00000   -341.00000    109.00000                      
9                        Pre-Tax Income    775.00000  -1209.00000   2582.00000   2927.00000   1868.00000    -57.00000  ...   1404.00000   1026.00000   1847.00000   1562.00000   1302.00000                      
10                         Income Taxes    357.00000   -163.00000    271.00000    524.00000    137.00000   -611.00000  ...    137.00000   -173.00000    -25.00000    361.00000    509.00000                      
11                   Income After Taxes    418.00000  -1046.00000   2311.00000   2403.00000   1731.00000    554.00000  ...   1267.00000   1199.00000   1872.00000   1201.00000    793.00000                      
12                         Other Income                                                                                ...                   0.00000                                                             
13    Income From Continuous Operations    286.00000   -192.00000   2311.00000   2403.00000   2145.00000   2069.00000  ...   3411.00000   1406.00000   2223.00000   1612.00000   1196.00000                      
14  Income From Discontinued Operations                                                                                ...                   0.00000                                                             
15                           Net Income    247.00000   -232.00000   2313.00000   2381.00000   2119.00000   1992.00000  ...   3151.00000   1406.00000   1959.00000   1334.00000    865.00000                      
16                               EBITDA   3965.00000   2732.00000   5613.00000   5894.00000   5360.00000   4475.00000  ...    949.00000    297.00000   1733.00000   1903.00000   1193.00000                      
17                                 EBIT    657.00000   -554.00000   2304.00000   2487.00000   1244.00000    829.00000  ...    949.00000    297.00000   1733.00000   1903.00000   1193.00000                      
18             Basic Shares Outstanding   1433.00000   1424.00000   1428.00000   1420.00000   1417.00000   1411.00000  ...   1504.00000   1612.90300   1500.00000   1500.00000   1500.00000                      
19                   Shares Outstanding   1440.00000   1439.00000   1442.00000   1438.00000   1436.00000   1431.00000  ...   1817.00000   1612.90300   1630.00000   1567.00000   1567.00000                      
20                            Basic EPS      0.17000     -0.18000      1.62000      1.68000      1.50000      1.43000  ...      2.09000      0.31000      1.31000      0.89000      0.58000                      
21             EPS - Earnings Per Share      0.17000     -0.17000      1.60000      1.66000      1.48000      1.40000  ...      1.77000      0.31000      1.20000      0.85000      0.55000    0.00000    0.00000

[22 rows x 44 columns]
© www.soinside.com 2019 - 2024. All rights reserved.