如何从 Google 电子表格中的单元格读取链接(如果它位于 href 标记 (gspread) 内)

问题描述 投票:0回答:2

我是 stackoverflow 的新手,所以如果我做错了什么,我提前道歉

我在 Google 表格上有一个电子表格,例如,这个

href 标签内的单元格中有一个链接。我想使用 Google Sheets API 或 gspread 获取单元格的链接和文本。

我已经尝试过这个解决方案,但我得到访问令牌“无”。

我尝试过使用 beautifulsoup 进行网页抓取,但效果不佳。

对于bs4解决方案,我尝试使用这段代码,我发现here

from bs4 import BeautifulSoup
import requests

html = requests.get('https://docs.google.com/spreadsheets/d/1v8vM7yQ-27SFemt8_3IRiZr-ZauE29edin-azKpigws/edit#gid=0').text
soup = BeautifulSoup(html, "lxml")
tables = soup.find_all("table")

content = []

for table in tables:
    content.append([[td.text for td in row.find_all("td")] for row in table.find_all("tr")])

print(content)
python google-sheets gspread
2个回答
0
投票

我想通了。这是完整的代码,如果有人需要的话

import requests
import gspread
import urllib.parse
import pickle



spreadsheetId = "###"  # Please set the Spreadsheet ID.
cellRange = "Yoursheetname!A1:A100"  # Please set the range with A1Notation. In this case, the hyperlink of the cell "A1" of "Sheet1" is retrieved.


with open('token_sheets_v4.pickle', 'rb') as token:
    # get this file here
    # https://developers.google.com/identity/sign-in/web/sign-in
    credentials = pickle.load(token)

client = gspread.authorize(credentials)

# 1. Retrieve the access token.
access_token = client.auth.token

# 2. Request to the method of spreadsheets.get in Sheets API using `requests` module.
fields = "sheets(data(rowData(values(hyperlink))))"
url = "https://sheets.googleapis.com/v4/spreadsheets/" + spreadsheetId + "?ranges=" + urllib.parse.quote(cellRange) + "&fields=" + urllib.parse.quote(fields)
res = requests.get(url, headers={"Authorization": "Bearer " + access_token})
print(res)

# 3. Retrieve the hyperlink.
obj = res.json()
print(obj)
link = obj["sheets"][0]['data'][0]['rowData'][0]['values'][0]['hyperlink']
print(link)

更新!!

更优雅的解决方案是这样的。创建服务:

CLIENT_SECRET_FILE = 'secret/secret.json'
API_SERVICE_NAME = 'sheets'
API_VERSION = 'v4'
SCOPES = ['https://www.googleapis.com/auth/spreadsheets.readonly']


def Create_Service():
    cred = None

    pickle_file = f'secret/token_{API_SERVICE_NAME}_{API_VERSION}.pickle'
if os.path.exists(pickle_file):
    with open(pickle_file, 'rb') as token:
        cred = pickle.load(token)

if not cred or not cred.valid:
    if cred and cred.expired and cred.refresh_token:
        cred.refresh(Request())
    else:
        flow = InstalledAppFlow.from_client_secrets_file(CLIENT_SECRET_FILE, SCOPES)
        cred = flow.run_local_server()

    with open(pickle_file, 'wb') as token:
        pickle.dump(cred, token)

try:
    service = build(API_SERVICE_NAME, API_VERSION, credentials=cred)
    print(API_SERVICE_NAME, 'service created successfully')
    return service
except Exception as e:
    print('Unable to connect.')
    print(e)
    return None

service = Create_Service()

并以方便的字典形式从电子表格中的每个工作表中提取链接

    fields = "sheets(properties(title),data(startColumn,rowData(values(hyperlink))))"
    
    print(service.spreadsheets().get(spreadsheetId=self.__spreadsheet_id,
                                     fields=fields).execute())

那么,字段是如何运作的。我们转到电子表格对象描述并寻找 JSON 表示。例如,如果我们想从该 json 表示形式返回工作表对象,我们只需使用此 fields = "sheets",因为 Spreadsheet 的 json 表示形式具有字段“sheets”。

好吧,酷。我们得到了 Sheets 对象。如何访问工作表对象字段?只需单击那个东西并查找它的字段即可。

那么,如何组合字段呢?这很容易。例如,我想从 Sheets 对象返回字段“properties”和“data”,我这样编写字段字符串:fields =“sheets(properties,data)”。所以我们只是将它们作为普通函数中的参数列出,但没有空格。

这同样适用于返回数据字段等的对象。


0
投票

您可以使用

def _spreadsheets_get(self, params=None)
中的
gspread/spreadsheet.py
方法来实现这一点。

示例:

params = {
    "spreadsheetId" : "spreadsheet_id_here",
    "ranges" : "Sheet1!A1:A1",
    "includeGridData" : True
}
print(spreadsheet._spreadsheets_get(params=params))

这将返回一个 JSON 对象,其中包含

textFormatRuns
部分中与超链接相关的数据。

© www.soinside.com 2019 - 2024. All rights reserved.