网站抓取:当我使用 Chrome Devtools 中的“网络”选项卡时,找不到为我带来所需数据的正确 URL

问题描述 投票:0回答:1

我正在尝试抓取广播电台网站以获取当前的图表(https://www.energy.de/programm/energy-euro-hot-30然后https://music.apple.com /de/playlist/energy-euro-hot-30/pl.9b672a18307c4cd7ba1ece0106891868)。我正在使用 Python 和 Requests HTML 模块。当我分析请求提供的HTML代码时,我可以分析的元素不包括在内。但是,如果我检查浏览器中显示的页面,我会找到所需的数据。我在本周初遇到了类似的问题,当时一位用户(https://stackoverflow.com/users/10035985/andrej-kesely)帮助了我。 他使用 Chrome Devtools 及其网络选项卡来查找正确的链接来访问所需的数据。我现在已经自己尝试过这个方法来解决我当前的问题,但我完全被大量的连接淹没了。也许有人可以将我推向正确的方向......

我尝试使用 Chrome Devtools 及其“网络”选项卡来查找正确的链接来获取我需要的数据。我没有成功。

python-3.x web-scraping google-chrome-devtools python-requests-html
1个回答
0
投票

您在“网络”选项卡中看不到任何内容,因为数据存储在页面中的

<script>
元素内。这是一个如何解析它的示例:

import json

import requests
from bs4 import BeautifulSoup


def find_tracks(o):
    if isinstance(o, dict):
        if o.get("itemKind") == "trackLockup":
            yield o["items"]
            return
        for v in o.values():
            yield from find_tracks(v)
    elif isinstance(o, list):
        for v in o:
            yield from find_tracks(v)


url = "https://music.apple.com/de/playlist/energy-euro-hot-30/pl.9b672a18307c4cd7ba1ece0106891868"

soup = BeautifulSoup(requests.get(url).content, "html.parser")
data = json.loads(soup.select_one("#serialized-server-data").text)

tracks = next(find_tracks(data))

# print(json.dumps(tracks, indent=4))

for track in tracks:
    print(f'{track["title"]:<55} {track["artistName"]}')

打印:

Overdrive (feat. Norma Jean Martine)                    Ofenbach
Houdini                                                 Dua Lipa
Strangers                                               Kenya Grace
When We Were Young (The Logical Song)                   David Guetta & Kim Petras
greedy                                                  Tate McRae
Gimme Love                                              Sia
Lose Control                                            Teddy Swims
Cynical                                                 twocolors, Safri Duo & Chris de Sarandy
Lovin On Me                                             Jack Harlow
Si No Estás                                             Iñigo Quintero
Paint The Town Red                                      Doja Cat
Water                                                   Tyla
On My Love                                              Zara Larsson & David Guetta
Is It Love                                              Loreen
I'll Be There                                           Robin Schulz, Rita Ora & Tiago PZK
Dreaming                                                Marshmello, P!nk & Sting
American Town                                           Ed Sheeran
Is It Over Now? (Taylor's Version) [From The Vault]     Taylor Swift
Better Me                                               Michael Schulte & R3HAB
Mwaki                                                   ZERB
Substitution (feat. Julian Perretta)                    Purple Disco Machine & Kungs
RUNAWAY                                                 OneRepublic
Blindside                                               James Arthur
Dive                                                    Lost Frequencies & Tom Gregory
Tattoo                                                  Loreen
LOVE'n'TENDRESSE                                        Eddy de Pretto
Prada                                                   cassö, RAYE & D-Block Europe
Never Give Up                                           Puggy
Used To Be Young                                        Miley Cyrus
Seasons                                                 Thirty Seconds to Mars
© www.soinside.com 2019 - 2024. All rights reserved.