这些过程写在下面的main()函数中,我想使用CloudFunction和CloudScheduler将它们应用到每天的定期处理中。
然而,事实上,下面的代码要求用户通过浏览器手动登录他/她的Google帐户。 我想重写代码,以便可以自动完成此登录,但我无法理解它...... 如果有人能帮助我,我将不胜感激......
由 www.DeepL.com/Translator 翻译(免费版本)
### ※※Authentication is required by browser※※
creds = flow.run_local_server(port=0)
### Result
Please visit this URL to authorize this application:
https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=132987612861-
4j24afrouontpeiv5ryy7sn64inhr.apps.googleusercontent.com&redirect_uri=
http%yyy%2Flocalhost%3yy6%2F&scope=httpsyyF%2Fwww.googleapis.com%2Fauth%2Fdrive.
readonly&state=XXXXXXXXXXXXXXXXXXXXXXXXXXX&access_type=offline
readonly&state=XXXXXXXXXXXXXXXXXXXXX 部分随着每次执行而改变。
执行上述代码部分时切换的浏览器屏幕
from __future__ import print_function
import io
import os
import key
import json
import os.path
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
from pprint import pprint
from webbrowser import Konqueror
from google.cloud import storage as gcs
from google.oauth2 import service_account
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.http import MediaIoBaseDownload, MediaIoBaseUpload, MediaFileUpload
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
SCOPES = ['https://www.googleapis.com/auth/drive.readonly']
def main(event, context):
"""Drive v3 API
Function to access shared Drive→get Spreadsheet→convert to parquet→upload to GCS """
creds = None
file_id = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxx' #Unedited data in shared drive
mime_type = 'text/csv'
# OAuth authentication to access shared drives
if os.path.exists('token.json'):
creds = Credentials.from_authorized_user_file('token.json', SCOPES)
# Allow users to log in if there are no (valid) credentials available if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
'credentials.json', SCOPES)
### ※※Browser authentication required※※
creds = flow.run_local_server(port=0)##Currently, we need a manual login here!
with open('token.json', 'w') as token:
token.write(creds.to_json())
try:
# Retrieve spreadsheets from shared drives
service = build('drive', 'v3', credentials=creds)
request = service.files().export_media(fileId=file_id, mimeType=mime_type)
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, request)
done = False
print(io.StringIO(fh.getvalue().decode()))
while done is False:
status, done = downloader.next_chunk()
# Read "Shared Drive/SpreadSheet" -> convert to parquet
df = pd.read_csv(io.StringIO(fh.getvalue().decode()))
table = pa.Table.from_pandas(df)
buf = pa.BufferOutputStream()
pq.write_table(table, buf,compression=None)
# service_account for save to GCS
key_path = 'service_account_file.json'
service_account_info = json.load(open(key_path))
credentials = service_account.Credentials.from_service_account_info(service_account_info)
client = gcs.Client(
credentials=credentials,
project=credentials.project_id,
)
# GCS information to be saved
bucket_name = 'bucket-name'
blob_name = 'sample-folder/daily-data.parquet'#save_path
bucket = client.get_bucket(bucket_name)
blob = bucket.blob(blob_name)
# parquet save to GCS
blob.upload_from_string(data=buf.getvalue().to_pybytes())
# ↓If a print appears, the data has been saved.
print("Blob '{}' created to '{}'!".format(blob_name, bucket_name))
except HttpError as error:
# TODO(developer) - Handle errors from drive API.
print(f'An error occurred: {error}')
我尝试使用selenium来运行浏览器,但无法很好地实现,因为浏览器登录URL每次都不同。 ←我也许能找到办法。