如何使用python每季度获取和特定雅虎财务数据的日期?

问题描述 投票:1回答:1

我可以通过以下代码从这个link下载年度数据,但它与网站上显示的数据不同,因为它是6月份的数据:

enter image description here

现在我有两个问题:

  1. 如何确定日期,以便年度数据与下图相同(9月而不是6月,如红色矩形所示)?
  2. 通过单击季度,如橙色矩形所示,链接将不会更改。我如何获取季度数据?

谢谢。

enter image description here

python web-scraping yahoo-finance
1个回答
1
投票

只是好奇,但为什么先将html写入文件然后用熊猫阅读? Pandas可以直接接受html请求:

import pandas as pd

symbol = 'AAPL'
url = 'https://finance.yahoo.com/quote/%s/financials?p=%s' %(symbol, symbol)

dfs = pd.read_html(url)   
print(dfs[0])

其次,不确定为什么你的年度日期突然出现。按照上面的方式行事就是九月。

print(dfs[0])
                                         0  ...                                  4
0                                  Revenue  ...                          9/26/2015
1                            Total Revenue  ...                          233715000
2                          Cost of Revenue  ...                          140089000
3                             Gross Profit  ...                           93626000
4                       Operating Expenses  ...                 Operating Expenses
5                     Research Development  ...                            8067000
6       Selling General and Administrative  ...                           14329000
7                            Non Recurring  ...                                  -
8                                   Others  ...                                  -
9                 Total Operating Expenses  ...                          162485000
10                Operating Income or Loss  ...                           71230000
11       Income from Continuing Operations  ...  Income from Continuing Operations
12         Total Other Income/Expenses Net  ...                            1285000
13      Earnings Before Interest and Taxes  ...                           71230000
14                        Interest Expense  ...                            -733000
15                       Income Before Tax  ...                           72515000
16                      Income Tax Expense  ...                           19121000
17                       Minority Interest  ...                                  -
18          Net Income From Continuing Ops  ...                           53394000
19                    Non-recurring Events  ...               Non-recurring Events
20                 Discontinued Operations  ...                                  -
21                     Extraordinary Items  ...                                  -
22            Effect Of Accounting Changes  ...                                  -
23                             Other Items  ...                                  -
24                              Net Income  ...                         Net Income
25                              Net Income  ...                           53394000
26   Preferred Stock And Other Adjustments  ...                                  -
27  Net Income Applicable To Common Shares  ...                           53394000

[28 rows x 5 columns]

对于第二部分,您可以尝试通过以下几种方式查找数据1:

1)检查XHR请求并获取所需的数据,方法是将参数包含在生成该数据的请求URL中,并以json格式返回给您(当我查找时,我无法立即找到,所以继续到下一个选项)

2)搜索<script>标签,因为json格式有时可以在那些标签内(我没有彻底搜索,并且认为Selenium只是一种直接的方式,因为大熊猫可以在表中读取)

3)使用selenium模拟打开浏览器,获取表格,然后单击“Quarterly”,然后获取该表

我选择了3:

from selenium import webdriver
import pandas as pd

symbol = 'AAPL'
url = 'https://finance.yahoo.com/quote/%s/financials?p=%s' %(symbol, symbol)

driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
driver.get(url)

# Get Table shown in browser
dfs_annual = pd.read_html(driver.page_source)   
print(dfs_annual[0])

# Click "Quarterly"
driver.find_element_by_xpath("//span[text()='Quarterly']").click()

# Get Table shown in browser
dfs_quarter = pd.read_html(driver.page_source)   
print(dfs_quarter[0])

driver.close()
© www.soinside.com 2019 - 2024. All rights reserved.