使用 Python 从网站下载 Starcraft2 MP3 的问题

问题描述 投票:0回答:1

我采用了这个 stackoverflow.com 链接中的代码,我如何使用#Python 从网站下载音乐文件,以使用来自以下网站的 Starcraft2 声音:https://nuclearlaunchdetected.com/。当我执行代码时,所有文件都变成了 360 字节并且已损坏。但是当手动下载文件时,它们很好。这是我目前正在使用的代码。任何帮助将不胜感激!

import requests
from bs4 import BeautifulSoup
import os

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
}

url = 'https://nuclearlaunchdetected.com/'
response = requests.get(url, verify=False)
soup = BeautifulSoup(response.text, 'html.parser')
links = soup.find_all('a')
download_dir = 'C:/Users/SC2 Sounds/'
counter = 0


for link in links:
    if link['href'].endswith('.mp3'):
        file_url = url + '/' + link['href']
        file_name = os.path.basename(file_url)
        print('Downloading:', file_name)
        with open(download_dir + file_name, 'wb') as f:
            f.write(requests.get(file_url, headers=headers).content)
        counter += 1
        if counter == 10:
            break

我运行上面的代码很好,但文件被证明是损坏的。我期待与手动下载时类似的非腐败可玩罚款。我也设置为只下载前10个文件用于测试目的

python html web-scraping mp3 starcraftgym
1个回答
0
投票

file_url 部分不正确。它应该是 file_url = link['href'] 而不是 file_url = url + '/' + link['href']。该 url 不应附加到 file_url。它有助于使用 print(file_url) 进行调试。这是修改后的代码:

import requests
from bs4 import BeautifulSoup
import os

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
}

url = 'https://nuclearlaunchdetected.com/'
response = requests.get(url, verify=False)
soup = BeautifulSoup(response.text, 'html.parser')
links = soup.find_all('a')
download_dir = 'C:/Users/SC2 Sounds/'
counter = 0


for link in links:
    if link['href'].endswith('.mp3'):
        file_url = link['href']
        file_name = os.path.basename(file_url)
        print('Downloading:', file_name)
        with open(download_dir + file_name, 'wb') as f:
            f.write(requests.get(file_url, headers=headers).content)
        counter += 1
        if counter == 10:
            break
© www.soinside.com 2019 - 2024. All rights reserved.