从 PHP 网页获取表格数据到数据框中

问题描述 投票:0回答:1

此网页('*https://www.nseindia.com/market-data/top-gainers-losers*')有 2 个表格('gainers' 和 'losers')。

我想要一个代码来读取网页的内容并将这两个表下载到两个单独的数据框中。我如何实现这一目标?

python python-3.x dataframe
1个回答
0
投票

此页面使用

JavaScript
来生成页面,因此通常需要 Selenium 来控制可以运行
JavaScript

的真实网络浏览器

但是有按钮

Download .csv
可以将表格下载为
CSV

但它没有

URL
- 只有
onclick="downloadCSVFile('loosers')"
- 也许使用
downloadCSVFile('loosers')
中的
Selenium
你可以下载它。

但是我在 Firefox 中下载了这个文件,然后在 Firefox 中打开了

Download Manager
,然后我选择了下载的文件并使用
Copy Download Link
我获得了该文件的链接:

获胜者:
https://www.nseindia.com/api/live-analysis-variations?index=gainers&type=NIFTY&csv=true

失败者:
https://www.nseindia.com/api/live-analysis-variations?index=loosers&type=NIFTY&csv=true

现在我测试是否可以使用

requests

下载它
import requests

session = requests.Session()
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:125.0) Gecko/20100101 Firefox/125.0'}

url = 'https://www.nseindia.com/market-data/top-gainers-losers'
response = session.get(url, headers=headers)  
#print(response.text)

url = 'https://www.nseindia.com/api/live-analysis-variations?index=gainers&type=NIFTY&csv=true'
response = session.get(url, headers=headers)
response.encoding = "utf-8-sig"
print(response.text)

首先:它需要标头

User-Agent
才能从服务器下载任何内容。如果您没有此标头,那么它就会挂起。

第二:下载之前需要获取主页 - 可能是为了获取一些cookie。 第三:它给出开头带有


的文本 - 它是 BOM (
Byte Order Mark
)。您需要使用编码
utf-8-sig
来跳过它

这给了我:

Symbol","Open","High","Low","Prev. Close","LTP","%chng","Volume","Value","CA "
"BAJFINANCE",6840.05,7150,6810.05,6893.2,7110,3.15,1218375,8520583725,"30-Jun-2023"
"M&M",2034,2087,1998.2,2024.95,2084,2.92,3253248,6691117824,"14-Jul-2023"
"HDFCBANK",1486.55,1534.95,1480.25,1494.7,1534.2,2.64,17288217,26183869057.35,"16-May-2023"
"MARUTI",12399.9,12759.4,12225,12405,12690,2.3,635535,7949392531.65,"03-Aug-2023"
"JSWSTEEL",842.85,867.3,833.2,844.8,864,2.27,3157898,2697823840.38,"11-Jul-2023"
"BHARTIARTL",1280,1296.5,1253.35,1265.75,1289.95,1.91,13103862,16751846142.18,"11-Aug-2023"
"GRASIM",2220,2290.75,2201.35,2226.05,2266.9,1.84,1061064,2394651677.76,"10-Jan-2024"
"WIPRO",440,453.9,437,444.35,452.1,1.74,10235053,4572612278.28,"24-Jan-2024"
"BAJAJFINSV",1587,1628.75,1568.7,1593.9,1618.6,1.55,1242066,1990398344.34,"30-Jun-2023"
"APOLLOHOSP",6140,6199,6050,6074.15,6155,1.33,560178,3437179384.86,"20-Feb-2024"
"ITC",418,426.25,416,418.85,424,1.23,16582634,7025067067.76,"08-Feb-2024"
"ICICIBANK",1052.95,1072,1048.1,1055.45,1068,1.19,11284433,11984406378.99,"09-Aug-2023"
"ADANIPORTS",1280,1316,1270,1295.55,1310.8,1.18,3899281,5057640406.67,"28-Jul-2023"
"TATASTEEL",160,162.5,157.3,160.05,161.9,1.16,60078229,9640753407.63,"22-Jun-2023"
"TECHM",1163.05,1204.85,1162.95,1179.65,1192.5,1.09,2572144,3057173194.08,"02-Nov-2023"
"TITAN",3525.1,3571.2,3478.25,3525.1,3562,1.05,1507940,5329859168.2,"13-Jul-2023"
"AXISBANK",1015,1036.95,995.7,1024,1031.7,0.75,21598007,21821762352.52,"07-Jul-2023"
"SBIN",734.5,752,732.05,744.8,750,0.7,10886554,8092302184.82,"31-May-2023"
"HINDUNILVR",2220,2243.75,2196,2214.8,2230,0.69,2337694,5205834145.54,"02-Nov-2023"
"INDUSINDBK",1466,1490.25,1444.4,1474.4,1483,0.58,4311650,6341790402.5,"02-Jun-2023"

使用

io
我可以将其加载到
pandas

import pandas as pd
import io

df = pd.read_csv(io.StringIO(response.text))

print(df)

结果:

0   BAJFINANCE   6840.05   7150.00   6810.05      6893.20   7110.00   3.15   1218375  8.520584e+09  30-Jun-2023
1          M&M   2034.00   2087.00   1998.20      2024.95   2084.00   2.92   3253248  6.691118e+09  14-Jul-2023
2     HDFCBANK   1486.55   1534.95   1480.25      1494.70   1534.20   2.64  17288217  2.618387e+10  16-May-2023
3       MARUTI  12399.90  12759.40  12225.00     12405.00  12690.00   2.30    635535  7.949393e+09  03-Aug-2023
4     JSWSTEEL    842.85    867.30    833.20       844.80    864.00   2.27   3157898  2.697824e+09  11-Jul-2023
5   BHARTIARTL   1280.00   1296.50   1253.35      1265.75   1289.95   1.91  13103862  1.675185e+10  11-Aug-2023
6       GRASIM   2220.00   2290.75   2201.35      2226.05   2266.90   1.84   1061064  2.394652e+09  10-Jan-2024
7        WIPRO    440.00    453.90    437.00       444.35    452.10   1.74  10235053  4.572612e+09  24-Jan-2024
8   BAJAJFINSV   1587.00   1628.75   1568.70      1593.90   1618.60   1.55   1242066  1.990398e+09  30-Jun-2023
9   APOLLOHOSP   6140.00   6199.00   6050.00      6074.15   6155.00   1.33    560178  3.437179e+09  20-Feb-2024
10         ITC    418.00    426.25    416.00       418.85    424.00   1.23  16582634  7.025067e+09  08-Feb-2024
11   ICICIBANK   1052.95   1072.00   1048.10      1055.45   1068.00   1.19  11284433  1.198441e+10  09-Aug-2023
12  ADANIPORTS   1280.00   1316.00   1270.00      1295.55   1310.80   1.18   3899281  5.057640e+09  28-Jul-2023
13   TATASTEEL    160.00    162.50    157.30       160.05    161.90   1.16  60078229  9.640753e+09  22-Jun-2023
14       TECHM   1163.05   1204.85   1162.95      1179.65   1192.50   1.09   2572144  3.057173e+09  02-Nov-2023
15       TITAN   3525.10   3571.20   3478.25      3525.10   3562.00   1.05   1507940  5.329859e+09  13-Jul-2023
16    AXISBANK   1015.00   1036.95    995.70      1024.00   1031.70   0.75  21598007  2.182176e+10  07-Jul-2023
17        SBIN    734.50    752.00    732.05       744.80    750.00   0.70  10886554  8.092302e+09  31-May-2023
18  HINDUNILVR   2220.00   2243.75   2196.00      2214.80   2230.00   0.69   2337694  5.205834e+09  02-Nov-2023
19  INDUSINDBK   1466.00   1490.25   1444.40      1474.40   1483.00   0.58   4311650  6.341790e+09  02-Jun-2023
© www.soinside.com 2019 - 2024. All rights reserved.