使用 python 在 Barchart.com 上自动下载文件

问题描述 投票:0回答:2

我想从此链接自动下载表格: https://www.barchart.com/options/iv-rank-percentile/stocks

为了做到这一点,在一些教程的帮助下,我编写了这段代码:

# Import libraries
from urllib.request import Request, urlopen
import requests
from bs4 import BeautifulSoup as soup

# Set the URL you want to webscrape from
url = 'https://www.barchart.com/options/iv-rank-percentile/stocks?viewName=main'

# Connect to the URL
response = requests.get(url)
print(response)

req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
print(req)

# Parse HTML and save to BeautifulSoup object¶
page_soup = soup(webpage, "html.parser")
#print(page_soup)

containers = page_soup.findAll("a", "toolbar-button download")

for container in containers:
    print(container)
    url = container.get('href')
    print(url)

我打印的结果如下:

<Response [403]>
<urllib.request.Request object at 0x030766F0>
<a class="toolbar-button download" data-bc-download-button="  Stocks IV Rank and IV Percentile  "> <i class="bc-glyph-download"></i> <span>download</span></a>
None

我似乎找不到“href”

此时,我在接下来的步骤中遇到了一些困难,因为我真的不知道如何下载该文件(因为找不到“href”)。

也许有人可以提供帮助/或提出另一个解决方案?

提前非常感谢,

市场向导

python web-scraping beautifulsoup
2个回答
3
投票

数据通过 Javascript 从不同的 URL 动态加载。您可以使用此示例如何加载数据:

import json
import requests
from urllib.parse import unquote


headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0"
}

url = "https://www.barchart.com/proxies/core-api/v1/quotes/get?list=options.mostActive.us&fields=symbol,symbolName,lastPrice,priceChange,percentChange,optionsTotalVolume,optionsWeightedImpliedVolatility,optionsImpliedVolatilityRank1y,optionsImpliedVolatilityPercentile1y,optionsWeightedImpliedVolatilityHigh1y,tradeTime,symbolCode,symbolType,hasOptions&between(lastPrice,.10,)=&between(tradeTime,2021-03-22,2021-03-23)=&orderBy=optionsTotalVolume&orderDir=desc&meta=field.shortName,field.type,field.description&hasOptions=true&page=1&limit=100&raw=1"

with requests.Session() as s:
    # get all cookies
    s.get(
        "https://www.barchart.com/options/iv-rank-percentile/stocks",
        headers=headers,
    )
    # use one cookie as HTTP header
    headers["X-XSRF-TOKEN"] = unquote(s.cookies["XSRF-TOKEN"])
    data = s.get(url, headers=headers).json()


# uncomment this to print all data:
# print(json.dumps(data, indent=4))

for d in data["data"]:
    print("{:<8}{:<50}{}".format(d["symbol"], d["symbolName"], d["lastPrice"]))

打印:

AAPL    Apple Inc                                         123.39
TSLA    Tesla Inc                                         670.00
FB      Facebook Inc                                      293.54
AMC     Amc Entertainment Holdings Inc                    12.49
PLTR    Palantir Technologies Inc Cl A                    24.22
NIO     Nio Inc                                           42.94
AMD     Adv Micro Devices                                 80.30
F       Ford Motor Company                                12.85
SNDL    Sundial Growers Inc                               1.3000
BAC     Bank of America Corp                              37.66
MSFT    Microsoft Corp                                    235.99
BABA    Alibaba Group Holding                             237.12
BA      Boeing Company                                    251.23
GE      General Electric Company                          13.13
AAL     American Airlines Gp                              23.83
DKNG    Draftkings Inc                                    71.72
WFC     Wells Fargo & Company                             38.97
AMZN    Amazon.com Inc                                    3,110.87
GM      General Motors Company                            58.10
INTC    Intel Corp                                        65.63
GME     Gamestop Corp                                     194.49
SNAP    Snap Inc                                          58.16
SOS     Sos Ltd                                           6.90
PFE     Pfizer Inc                                        36.00
NOK     Nokia Corp                                        4.06
T       AT&T Inc                                          29.99
CCL     Carnival Corp                                     27.48
NVDA    Nvidia Corp                                       527.11
MARA    Marathon Digital Hldgs Inc                        39.97
FTCH    Farfetch Ltd Cl A                                 62.00
UBER    Uber Technologies Inc                             55.69
TLRY    Tilray Inc                                        23.90
DIS     Walt Disney Company                               192.86
FCEL    Fuelcell Energy Inc                               15.04
QS      Quantumscape Corp                                 64.29
SQ      Square                                            226.13
CCIV    Churchill Capital IV Cl A                         26.15
V       Visa Inc                                          208.00
CSCO    Cisco Systems Inc                                 50.30
XOM     Exxon Mobil Corp                                  55.91
FCX     Freeport-Mcmoran Inc                              35.01
JPM     JP Morgan Chase & Company                         150.97
PLUG    Plug Power Inc                                    38.91
NFLX    Netflix Inc                                       523.11
VALE    Vale S.A.                                         17.01
TEVA    Teva Pharmaceutical Industries Ltd                11.93
CLF     Cleveland-Cliffs Inc                              15.86
MU      Micron Technology                                 91.27
BOX     Box Inc                                           23.65
TSM     Taiwan Semiconductor Manufacturing                117.18
RIOT    Riot Blockchain Inc                               56.01
BLNK    Blink Charging Company                            40.66
VZ      Verizon Communications Inc                        56.59
UAL     United Airlines Holdings Inc                      58.33
QCOM    Qualcomm Inc                                      134.09
CLVS    Clovis Oncology Inc                               7.47
RLX     Rlx Technology Inc ADR                            10.15
LUMN    Centurylink                                       14.37
WMT     Wal-Mart Stores                                   132.37
TWTR    Twitter Inc                                       65.21
NCLH    Norwegian Cruise Ord                              28.65
GOOGL   Alphabet Cl A                                     2,030.69
C       Citigroup Inc                                     71.96
JD      Jd.com Inc Ads                                    84.97
BB      Blackberry Ltd                                    10.71
X       United States Steel Corp                          21.79
RKT     Rocket Companies Inc Cl A                         22.99
PDD     Pinduoduo Inc ADR                                 137.15
NLY     Annaly Capital Management Inc                     8.92
FUBO    Fubotv Inc                                        31.53
MO      Altria Group                                      51.64
DASH    Doordash Inc Cl A                                 135.91
UWMC    Uwm Hldg Corp                                     8.78
KSS     Kohl's Corp                                       58.74
DAL     Delta Air Lines Inc                               47.97
NKLA    Nikola Corp                                       15.55
LYFT    Lyft Inc Cl A                                     64.13
WKHS    Workhorse Grp                                     15.63
PENN    Penn Natl Gaming Inc                              113.16
CRM     Salesforce.com Inc                                215.17
XPEV    Xpeng Inc ADR                                     37.88
BCRX    Biocryst Pharma Inc                               11.80
ET      Energy Transfer LP                                8.10
PTON    Peloton Interactive Inc                           109.54
BIDU    Baidu Inc                                         266.13
NKE     Nike Inc                                          138.27
PSTH    Pershing Square Tontine Hldgs Cl A                25.89
ACB     Aurora Cannabis Inc                               9.70
PYPL    Paypal Holdings                                   244.38
TME     Tencent Music Entertainment Group ADR             30.87
CAN     Canaan Inc ADR                                    22.97
GOLD    Barrick Gold Corp                                 20.62
SPCE    Virgin Galactic Holdings Inc                      32.24
ZM      Zoom Video Communications Cl A                    328.50
NNDM    Nano Dimension Ads                                9.83
CVX     Chevron Corp                                      102.54
SPRT    Support.com Inc                                   7.10
OXY     Occidental Petroleum Corp                         27.46
COST    Costco Wholesale                                  334.49
USAT    USA Technologies Inc                              12.45

0
投票

感谢您提供此示例!

它正在工作,但是,我有两个问题:

  • 这个网址来自哪里:
url = "https://www.barchart.com/proxies/core-api/v1/quotes/get?list=options.mostActive.us&fields=symbol,symbolName,lastPrice,priceChange,percentChange,optionsTotalVolume,optionsWeightedImpliedVolatility,optionsImpliedVolatilityRank1y,optionsImpliedVolatilityPercentile1y,optionsWeightedImpliedVolatilityHigh1y,tradeTime,symbolCode,symbolType,hasOptions&between(lastPrice,.10,)=&between(tradeTime,2021-03-22,2021-03-23)=&orderBy=optionsTotalVolume&orderDir=desc&meta=field.shortName,field.type,field.description&hasOptions=true&page=1&limit=100&raw=1"
  • 在此链接下https://www.barchart.com/options/iv-rank-percentile/stocks?viewName=main有几个包含结果的页面(每页 5 页,每页 100 个结果),我们可以选择单击“显示全部”按钮(显示 500 个结果)。在您提供的示例中,仅读取第一页的内容。我怎样才能获得完整的结果?

  • 关于表格右上角的“下载”按钮,是否可以“点击”它来下载表格?

© www.soinside.com 2019 - 2024. All rights reserved.