Python:使用 url 从谷歌驱动器下载文件

问题描述 投票:0回答:13

我正在尝试从谷歌驱动器下载文件,而我所拥有的只是驱动器的 URL。

我读过有关 google API 的文章,其中讨论了一些

drive_service
MedioIO
,这也需要一些凭据(主要是 JSON
file/OAuth
)。但我不知道它是如何工作的。

另外,尝试过

urllib2.urlretrieve
,但我的情况是从驱动器中获取文件。也试过
wget
但没用。

试过

PyDrive
图书馆。它具有良好的驱动上传功能,但没有下载选项。

任何帮助将不胜感激。 谢谢。

python download google-drive-api urllib2 pydrive
13个回答
122
投票

如果“驱动器的 url”是指 Google Drive 上文件的可共享链接,那么以下内容可能会有所帮助:

import requests

def download_file_from_google_drive(id, destination):
    URL = "https://docs.google.com/uc?export=download"

    session = requests.Session()

    response = session.get(URL, params = { 'id' : id }, stream = True)
    token = get_confirm_token(response)

    if token:
        params = { 'id' : id, 'confirm' : token }
        response = session.get(URL, params = params, stream = True)

    save_response_content(response, destination)    

def get_confirm_token(response):
    for key, value in response.cookies.items():
        if key.startswith('download_warning'):
            return value

    return None

def save_response_content(response, destination):
    CHUNK_SIZE = 32768

    with open(destination, "wb") as f:
        for chunk in response.iter_content(CHUNK_SIZE):
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)

if __name__ == "__main__":
    file_id = 'TAKE ID FROM SHAREABLE LINK'
    destination = 'DESTINATION FILE ON YOUR DISK'
    download_file_from_google_drive(file_id, destination)

截取的文件不使用 pydrive,也不使用 Google Drive SDK。它使用 requests 模块(不知何故,它是 urllib2 的替代品)。

从 Google Drive 下载大文件时,单个 GET 请求是不够的。需要第二个 - 请参阅 wget/curl large file from google drive.


70
投票

我推荐gdown包。

pip install gdown

带上你的分享链接

https://drive.google.com/file/d/0B9P1L--7Wd2vNm9zMTJWOGxobkU/view?usp=sharing

并获取 id - 例如。 1TLNdIufzwesDbyr_nVTR7Zrx9oRHLM_N 按下下载按钮(在链接处查找),并在下面的 id 之后交换它。

import gdown

url = 'https://drive.google.com/uc?id=0B9P1L--7Wd2vNm9zMTJWOGxobkU'
output = '20150428_collected_images.tgz'
gdown.download(url, output, quiet=False)

58
投票

多次有过类似的需求,我从上面@user115202 的代码片段开始制作了一个额外的简单类

GoogleDriveDownloader
。你可以在这里找到源代码

也可以通过pip安装:

pip install googledrivedownloader

那么用法就这么简单:

from google_drive_downloader import GoogleDriveDownloader as gdd

gdd.download_file_from_google_drive(file_id='1iytA1n2z4go3uVCwE__vIKouTKyIDjEq',
                                    dest_path='./data/mnist.zip',
                                    unzip=True)

此代码段将下载在 Google Drive 中共享的存档。在这种情况下,

1iytA1n2z4go3uVCwE__vIKouTKyIDjEq
是从 Google Drive 获得的可共享链接的 ID。


11
投票

这是一种无需第三方库和服务帐户即可轻松完成的方法。

pip 安装

google-api-core
google-api-python-client

from googleapiclient.discovery import build
from googleapiclient.http import MediaIoBaseDownload
from google.oauth2 import service_account
import io

credz = {} #put json credentials her from service account or the like
# More info: https://cloud.google.com/docs/authentication

credentials = service_account.Credentials.from_service_account_info(credz)
drive_service = build('drive', 'v3', credentials=credentials)

file_id = '0BwwA4oUTeiV1UVNwOHItT0xfa2M'
request = drive_service.files().get_media(fileId=file_id)
#fh = io.BytesIO() # this can be used to keep in memory
fh = io.FileIO('file.tar.gz', 'wb') # this can be used to write to disk
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
    status, done = downloader.next_chunk()
    print("Download %d%%." % int(status.progress() * 100))



9
投票

PyDrive
允许您使用
GetContentFile()
功能下载文件。您可以在 here.

找到函数的文档

请看下面的例子:

# Initialize GoogleDriveFile instance with file id.
file_obj = drive.CreateFile({'id': '<your file ID here>'})
file_obj.GetContentFile('cats.png') # Download file as 'cats.png'.

这段代码假设你有一个经过身份验证的

drive
对象,可以在herehere找到关于这个的文档。

在一般情况下,这样做是这样的:

from pydrive.auth import GoogleAuth

gauth = GoogleAuth()
# Create local webserver which automatically handles authentication.
gauth.LocalWebserverAuth()

# Create GoogleDrive instance with authenticated GoogleAuth instance.
drive = GoogleDrive(gauth)

有关服务器静默身份验证的信息可在此处找到,并涉及编写一个

settings.yaml
(例如:here),您可以在其中保存身份验证详细信息。


4
投票

文档中有一个函数,当我们提供要下载的文件的 ID 时,它会下载文件,

from __future__ import print_function import io import google.auth from googleapiclient.discovery import build from googleapiclient.errors import HttpError from googleapiclient.http import MediaIoBaseDownload def download_file(real_file_id): """Downloads a file Args: real_file_id: ID of the file to download Returns : IO object with location. Load pre-authorized user credentials from the environment. TODO(developer) - See https://developers.google.com/identity for guides on implementing OAuth2 for the application. """ creds, _ = google.auth.default() try: # create drive api client service = build('drive', 'v3', credentials=creds) file_id = real_file_id # pylint: disable=maybe-no-member request = service.files().get_media(fileId=file_id) file = io.BytesIO() downloader = MediaIoBaseDownload(file, request) done = False while done is False: status, done = downloader.next_chunk() print(F'Download {int(status.progress() * 100)}.') except HttpError as error: print(F'An error occurred: {error}') file = None return file.getvalue() if __name__ == '__main__': download_file(real_file_id='1KuPmvGq8yoYgbfW74OENMCB5H0n_2Jm9')


本题:

我们如何获取文件ID来下载文件?

一般来说,来自 Google Drive 的共享文件的 URL 看起来像这样

https://drive.google.com/file/d/1HV6vf8pB-EYnjcJcH65eGZVMa2v2tcMh/view?usp=sharing
其中

1HV6vf8pB-EYnjcJcH65eGZVMa2v2tcMh

对应于文件ID。

您可以简单地从 URL 复制它,或者,如果您愿意,也可以创建一个函数来从 URL 获取文件 ID。

例如,给定以下

url = https://drive.google.com/file/d/1HV6vf8pB-EYnjcJcH65eGZVMa2v2tcMh/view?usp=sharing

,

def url_to_id(url): x = url.split("/") return x[5]
打印 x 将给出

['https:', '', 'drive.google.com', 'file', 'd', '1HV6vf8pB-EYnjcJcH65eGZVMa2v2tcMh', 'view?usp=sharing']
因此,当我们想要返回第 6 个数组值时,我们使用 

x[5]


3
投票
这个上面也有介绍,

from pydrive.auth import GoogleAuth gauth = GoogleAuth() gauth.LocalWebserverAuth() drive = GoogleDrive(gauth)

这创建了自己的服务器也做身份验证的肮脏工作

file_obj = drive.CreateFile({'id': '<Put the file ID here>'}) file_obj.GetContentFile('Demo.txt')

这将下载文件


3
投票
import requests def download_file_from_google_drive(id, destination): URL = "https://docs.google.com/uc?export=download" session = requests.Session() response = session.get(URL, params = { 'id' : id , 'confirm': 1 }, stream = True) token = get_confirm_token(response) if token: params = { 'id' : id, 'confirm' : token } response = session.get(URL, params = params, stream = True) save_response_content(response, destination) def get_confirm_token(response): for key, value in response.cookies.items(): if key.startswith('download_warning'): return value return None def save_response_content(response, destination): CHUNK_SIZE = 32768 with open(destination, "wb") as f: for chunk in response.iter_content(CHUNK_SIZE): if chunk: # filter out keep-alive new chunks f.write(chunk) if __name__ == "__main__": file_id = 'TAKE ID FROM SHAREABLE LINK' destination = 'DESTINATION FILE ON YOUR DISK' download_file_from_google_drive(file_id, destination)
只是重复接受的答案但添加

confirm=1

参数所以即使文件太大它总是下载


1
投票
# Importing [PyDrive][1] OAuth from pydrive.auth import GoogleAuth def download_tracking_file_by_id(file_id, download_dir): gauth = GoogleAuth(settings_file='../settings.yaml') # Try to load saved client credentials gauth.LoadCredentialsFile("../credentials.json") if gauth.credentials is None: # Authenticate if they're not there gauth.LocalWebserverAuth() elif gauth.access_token_expired: # Refresh them if expired gauth.Refresh() else: # Initialize the saved creds gauth.Authorize() # Save the current credentials to a file gauth.SaveCredentialsFile("../credentials.json") drive = GoogleDrive(gauth) logger.debug("Trying to download file_id " + str(file_id)) file6 = drive.CreateFile({'id': file_id}) file6.GetContentFile(download_dir+'mapmob.zip') zipfile.ZipFile(download_dir + 'test.zip').extractall(UNZIP_DIR) tracking_data_location = download_dir + 'test.json' return tracking_data_location

上述函数将给定 file_id 的文件下载到指定的下载文件夹。现在问题来了,如何获取file_id?只需将 url 按 id= 拆分即可获取 file_id。

file_id = url.split("id=")[1]
    

1
投票
我尝试使用谷歌 Colaboratory:

https://colab.research.google.com/

假设您的共享链接是

https://docs.google.com/spreadsheets/d/12hiI0NK7M0KEfscMfyBaLT9gxcZMleeu/edit?usp=sharing&ouid=102608702203033509854&rtpof=true&sd=true

你只需要 id 就是 12hiI0NK7M0KEfscMfyBaLT9gxcZMleeu

单元格中的命令

!gdown 12hiI0NK7M0KEfscMfyBaLT9gxcZMleeu
运行单元格,您将看到该文件已下载到 /content/Amazon_Reviews.xlsx

注意:要知道如何使用Google colab


0
投票
这个例子是基于一个类似RayB的,但是把文件保存在内存中 并且更简单一些,您可以将其粘贴到 colab 中并且可以使用。

import googleapiclient.discovery import oauth2client.client from google.colab import auth auth.authenticate_user() def download_gdrive(id): creds = oauth2client.client.GoogleCredentials.get_application_default() service = googleapiclient.discovery.build('drive', 'v3', credentials=creds) return service.files().get_media(fileId=id).execute() a = download_gdrive("1F-yaQB8fdsfsdafm2l8WFjhEiYSHZrCcr")
    

0
投票
我用了很长一段时间接受的解决方案,但现在谷歌已经改变了下载警告响应,所以它不再起作用了。

我现在正在使用 API,因为它是确保它不会突然停止的更安全的方法,但我也可以让它工作解析响应 HTML 以查找下载 url,如下所示:

import requests from html.parser import HTMLParser class MyHTMLParser(HTMLParser): def __init__(self): super().__init__() self.action = None def handle_starttag(self, tag, attrs): if tag == "form": for name, value in attrs: if name == "id" and value == "download-form": for name, value in attrs: if name == "action": self.action = value DOWNLOAD_URL = 'https://docs.google.com/uc?export=download' session = requests.Session() response = session.get(file_url, params={'id': id}, stream=True) content_type = response.headers['content-type'] if content_type == 'text/html; charset=utf-8': parser = MyHTMLParser() parser.feed(response.text) download_url = parser.action response = session.post(download_url, stream=True) file = response.content
    

-3
投票
可以安装

https://pypi.org/project/googleDriveFileDownloader/

pip install googleDriveFileDownloader



并下载文件,这里是下载示例代码

from googleDriveFileDownloader import googleDriveFileDownloader a = googleDriveFileDownloader() a.downloadFile("https://drive.google.com/uc?id=1O4x8rwGJAh8gRo8sjm0kuKFf6vCEm93G&export=download")
    
© www.soinside.com 2019 - 2024. All rights reserved.