如何在网站的 HTML/JavaScript 中查找和解码 URL 编码字符串以从 OddsPortal 抓取实时赔率？

Question

我正在开展一个项目，从 OddsPortal 获取单场比赛的实时赔率。 https://www.oddsportal.com/inplay-odds/live-now/football/ 基于这个有用的指南https://github.com/jckkrr/Unlayering_Oddsportal，

我的目标是获取每场比赛的实时赔率数据，但我在访问必要的 URL 时遇到了挑战。

使用 Python 的 requests 库，我可以从此 feed URL 获取所有实时比赛的列表： https://www.oddsportal.com/feed/livegames/liveOdds/0/0.dat?_=

import requests

url = "https://www.oddsportal.com/feed/livegames/liveOdds/0/0.dat?_="
response = requests.get(url)
data = response.text

尝试获取每场比赛的实时赔率时就会出现问题。

赔率包含在具有以下结构的单独 URL 中：

https://fb.oddsportal.com/feed/match/1-1-{match_id_code}-1-2-{secondary_id_code}.dat

这是单个现场比赛网页的屏幕截图及其各自的赔率提要网址https://www.oddsportal.com/feed/live-event/1-1-AsILkjnd-1-2-yjbd1.dat（当直播比赛结束，赔率url返回404）

在此示例中（来自屏幕截图），第一个 ID 代码

AsILkjnd

可以在以下 Feed URL 的所有实时比赛列表中找到：https://www.oddsportal.com/feed/livegames/liveOdds/0 /0.dat?_=

但是 secondary_id_code 在那里找不到，甚至在单个页面的 html 中也找不到。

我目前正致力于查找和解码 secondary_id_code。

它似乎是一个类似于

%79%6a%39%64%39

的 URL 编码字符串，我相信它隐藏在网站的 HTML 或 JavaScript 代码中。

到目前为止，我一直无法找到这些编码字符串。

任何人都可以帮助如何查找和解码这些 URL 编码的字符串

Answer 1

由于 secondary_id_code 不易获得，因此它很可能是通过 JavaScript 动态加载到页面上的。像 OddsPortal 这样的网站经常使用 JavaScript 来动态加载数据，这意味着简单地获取页面的 HTML 可能不会显示浏览器将向用户显示的所有数据。以下是解决这个问题的方法：

1.分析网络流量

使用浏览器的开发人员工具（通常通过按 F12 或右键单击并选择“检查”来访问）并转到“网络”选项卡。
刷新页面并观察初始页面加载后加载的 XHR (XMLHttpRequest) 或 Fetch 请求。这些请求通常会获取动态内容，例如您的 secondary_id_code 。

2.使用 Selenium 或类似工具：

由于 secondary_id_code 可能会动态加载，因此请考虑使用 Selenium，这是一种自动化 Web 浏览器的工具。 Selenium 可以像真正的浏览器一样执行 JavaScript，允许您访问动态加载的数据。
这是使用 Selenium 访问动态内容的简化方法：


    from selenium import webdriver
    
    # Path to your WebDriver (e.g., ChromeDriver)
    driver_path = '/path/to/your/chromedriver'
    
    # URL of the live matches page
    url = 'https://www.oddsportal.com/inplay-odds/live-now/football/'
    
    # Initialize the WebDriver and open the URL
    driver = webdriver.Chrome(executable_path=driver_path)
    driver.get(url)
    
    # You may need to wait for the page to load dynamically loaded content
    # For this, Selenium provides explicit and implicit waits
    
    # Now, you can search the DOM for the `secondary_id_code` as it would be rendered in a browser
    # For example, finding an element that contains the code, or observing AJAX requests that might contain it
    # This could involve analyzing the page's JavaScript or observing network requests, as mentioned earlier
    
    # Always remember to close the WebDriver
    driver.quit()

3.解码 secondary_id_code

如果您找到 secondary_id_code 但它是 URL 编码的（如 %79%6a%39%64%39），您可以使用 Python 的 urllib.parse.unquote() 函数对其进行解码：


    from urllib.parse import unquote
    
    encoded_str = '%79%6a%39%64%39'
    decoded_str = unquote(encoded_str)
    print(decoded_str)  # This will print the decoded string

如何在网站的 HTML/JavaScript 中查找和解码 URL 编码字符串以从 OddsPortal 抓取实时赔率？

问题描述投票：0回答：1

1个回答

1.分析网络流量

2.使用 Selenium 或类似工具：

3.解码 secondary_id_code

最新问题

如何在网站的 HTML/JavaScript 中查找和解码 URL 编码字符串以从 OddsPortal 抓取实时赔率？

问题描述 投票：0回答：1

1个回答

1.分析网络流量

2.使用 Selenium 或类似工具：

3.解码 secondary_id_code

最新问题

问题描述投票：0回答：1