使用/在 Google Apps 脚本中在大型 Google 表格中搜索行的最快方法

问题描述 投票:0回答:2

GAS 非常强大,您可以使用 Google Sheet 作为数据库后端来编写完整的网络应用程序。有很多理由不这样做,但我认为在某些情况下这是可以的。

我认为最大的问题是在包含大量行的工作表中根据某些条件查找行时的性能问题。我知道有很多方法可以“查询”工作表,但我找不到关于哪种方法最快的可靠信息。

复杂性之一是许多人可以编辑工作表,这意味着您必须考虑多种情况。为了简单起见,我想假设这张表:

  • 已锁定,因此只有一个人可以看到
  • 第一列有行号(
    =row()
    )

最基本的查询是查找特定列等于某个值的行。

哪种方法最快?

performance google-apps-script google-sheets google-visualization google-query-language
2个回答
13
投票

我有一张大约 19k 行和大约 38 列的工作表,其中充满了各种未排序的真实数据。那是几乎 700k 行,所以我认为这将是一个很好的表,可以对几种方法进行计时,看看哪个是最快的。

  • 方法 1:获取工作表作为二维数组,然后遍历每一行
  • 方法2:获取sheet作为二维数组,对其进行排序,然后使用二分搜索算法找到行
  • 方法 3:对
    Google 可视化查询
    进行 UrlFetch 调用,并且不提供最后一行
  • 方法 4:对
    Google 可视化查询
    进行 UrlFetch 调用并提供最后一行

这是我的查询功能。

function method1(spreadsheetID, sheetName, columnIndex, query)
{
    // get the sheet values excluding header, 
    var rowValues = SpreadsheetApp.openById(spreadsheetID).getSheetByName(sheetName).getSheetValues(2, 1, -1, -1);

    // loop through each row
    for(var i = 0, numRows = rowValues.length; i < numRows; ++i)
    {
        // return it if found
        if(rowValues[i][columnIndex] == query) return rowValues[i]
    }

    return false;
}

function method2(spreadsheetID, sheetName, columnIndex, query)
{
    // get the sheet values excluding header
    var rowValues = SpreadsheetApp.openById(spreadsheetID).getSheetByName(sheetName).getSheetValues(2, 1, -1, -1);

    // sort it
    rowValues.sort(function(a, b){
        if(a[columnIndex] < b[columnIndex]) return -1;
        if(a[columnIndex] > b[columnIndex]) return 1;
        return 0;
    });

    // search using binary search
    var foundRow = matrixBinarySearch(rowValues, columnIndex, query, 0, rowValues.length - 1);

    // return if found
    if(foundRow != -1)
    {
        return rowValues[foundRow];
    }

    return false;
}

function method3(spreadsheetID, sheetName, queryColumnLetterStart, queryColumnLetterEnd, queryColumnLetterSearch, query)
{
    // SQL like query
    myQuery = "SELECT * WHERE " + queryColumnLetterSearch + " = '" + query + "'";

    // the query URL
    // don't provide last row in range selection
    var qvizURL = 'https://docs.google.com/spreadsheets/d/' + spreadsheetID + '/gviz/tq?tqx=out:json&headers=1&sheet=' + sheetName + '&range=' + queryColumnLetterStart + ":" + queryColumnLetterEnd + '&tq=' + encodeURIComponent(myQuery);

    // fetch the data
    var ret = UrlFetchApp.fetch(qvizURL, {headers: {Authorization: 'Bearer ' + ScriptApp.getOAuthToken()}}).getContentText();

    // remove some crap from the return string
    return JSON.parse(ret.replace("/*O_o*/", "").replace("google.visualization.Query.setResponse(", "").slice(0, -2));
}

function method4(spreadsheetID, sheetName, queryColumnLetterStart, queryColumnLetterEnd, queryColumnLetterSearch, query)
{
    // find the last row in the sheet
    var lastRow = SpreadsheetApp.openById(spreadsheetID).getSheetByName(sheetName).getLastRow();

    // SQL like query
    myQuery = "SELECT * WHERE " + queryColumnLetterSearch + " = '" + query + "'";

    // the query URL
    var qvizURL = 'https://docs.google.com/spreadsheets/d/' + spreadsheetID + '/gviz/tq?tqx=out:json&headers=1&sheet=' + sheetName + '&range=' + queryColumnLetterStart + "1:" + queryColumnLetterEnd + lastRow + '&tq=' + encodeURIComponent(myQuery);

    // fetch the data
    var ret = UrlFetchApp.fetch(qvizURL, {headers: {Authorization: 'Bearer ' + ScriptApp.getOAuthToken()}}).getContentText();

    // remove some crap from the return string
    return JSON.parse(ret.replace("/*O_o*/", "").replace("google.visualization.Query.setResponse(", "").slice(0, -2));
}

我的二分搜索算法:

function matrixBinarySearch(matrix, columnIndex, query, firstIndex, lastIndex)
{
    // find the value using binary search
    // https://www.w3resource.com/javascript-exercises/javascript-array-exercise-18.php

    // first make sure the query string is valid
    // if it is less than the smallest value
    // or larger than the largest value
    // it is not valid
    if(query < matrix[firstIndex][columnIndex] || query > matrix[lastIndex][columnIndex]) return -1;

    // if its the first row
    if(query == matrix[firstIndex][columnIndex]) return firstIndex;

    // if its the last row
    if(query == matrix[lastIndex][columnIndex]) return lastIndex;

    // now start doing binary search
    var middleIndex = Math.floor((lastIndex + firstIndex)/2);

    while(matrix[middleIndex][columnIndex] != query && firstIndex < lastIndex)
    {
        if(query < matrix[middleIndex][columnIndex])
        {
            lastIndex = middleIndex - 1;
        }
        else if(query > matrix[middleIndex][columnIndex])
        {
            firstIndex = middleIndex + 1;
        }

        middleIndex = Math.floor((lastIndex + firstIndex)/2);
    }

    return matrix[middleIndex][columnIndex] == query ? middleIndex : -1;
}

这是我用来测试它们的函数:

// each time this function is called it will try one method
// the first time it is called it will try method1
// then method2, then method3, then method4
// after it does method4 it will start back at method1
// we will use script properties to save which method is next
// we also want to use the same query string for each batch so we'll save that in script properties too
function testIt()
{
    // get the sheet where we're staving run times
    var runTimesSheet = SpreadsheetApp.openById("...").getSheetByName("times");

    // we want to see true speed tests and don't want server side caching so we a copy of our data sheet
    // make a copy of our data sheet and get its ID
    var tempSheetID = SpreadsheetApp.openById("...").copy("temp sheet").getId();

    // get script properties
    var scriptProperties = PropertiesService.getScriptProperties();

    // the counter
    var searchCounter = Number(scriptProperties.getProperty("searchCounter"));

    // index of search list we want to query for
    var searchListIndex = Number(scriptProperties.getProperty("searchListIndex"));

    // if we're at 0 then we need to get the index of the query string
    if(searchCounter == 0)
    {
        searchListIndex = Math.floor(Math.random() * searchList.length);
        scriptProperties.setProperty("searchListIndex", searchListIndex);
    }

    // query string
    var query = searchList[searchListIndex];

    // save relevant data
    var timerRow = ["method" + (searchCounter + 1), searchListIndex, query, 0, "", "", "", ""];

    // run the appropriate method
    switch(searchCounter)
    {
        case 0:
            // start time
            var start = (new Date()).getTime();

            // run the query
            var ret = method1(tempSheetID, "Extract", 1, query);

            // end time
            timerRow[3] = ((new Date()).getTime() - start) / 1000;

            // if we found the row save its values in the timer output so we can confirm it was found
            if(ret)
            {
                timerRow[4] = ret[0];
                timerRow[5] = ret[1];
                timerRow[6] = ret[2]; 
                timerRow[7] = ret[3];
            }
            break;
        case 1:
            var start = (new Date()).getTime();
            var ret = method2(tempSheetID, "Extract", 1, query);
            timerRow[3] = ((new Date()).getTime() - start) / 1000;
            if(ret)
            {
                timerRow[4] = ret[0];
                timerRow[5] = ret[1];
                timerRow[6] = ret[2]; 
                timerRow[7] = ret[3];
            }
            break;
        case 2:
            var start = (new Date()).getTime();
            var ret = method3(tempSheetID, "Extract", "A", "AL", "B", query);
            timerRow[3] = ((new Date()).getTime() - start) / 1000;
            if(ret.table.rows.length)
            {
                timerRow[4] = ret.table.rows[0].c[0].v;
                timerRow[5] = ret.table.rows[0].c[1].v;
                timerRow[6] = ret.table.rows[0].c[2].v;
                timerRow[7] = ret.table.rows[0].c[3].v;
            }
            break;
        case 3:
            var start = (new Date()).getTime();
            var ret = method3(tempSheetID, "Extract", "A", "AL", "B", query);
            timerRow[3] = ((new Date()).getTime() - start) / 1000;
            if(ret.table.rows.length)
            {
                timerRow[4] = ret.table.rows[0].c[0].v;
                timerRow[5] = ret.table.rows[0].c[1].v;
                timerRow[6] = ret.table.rows[0].c[2].v;
                timerRow[7] = ret.table.rows[0].c[3].v;
            }
            break;
    }

    // delete the temp file
    DriveApp.getFileById(tempSheetID).setTrashed(true);

    // save run times
    runTimesSheet.appendRow(timerRow);

    // start back at 0 if we're the end
    if(++searchCounter == 4) searchCounter = 0;

    // save the search counter
    scriptProperties.setProperty("searchCounter", searchCounter);
}

我有一个全局变量

searchList
,它是各种查询字符串的数组——有些在工作表中,有些不在工作表中。

我在触发器上运行

testit
,每分钟运行一次。经过 152 次迭代后,我得到了 38 个批次。查看结果,这是我对每种方法看到的结果:

| Method  | Minimum Seconds | Maximum Seconds | Average Seconds |
|---------|-----------------|-----------------|-----------------|
| method1 |            8.24 |           36.94 |           11.86 |
| method2 |            9.93 |           23.38 |           14.09 |
| method3 |            1.92 |            5.48 |            3.06 |
| method4 |            2.20 |           11.14 |            3.36 |

因此,至少对于我的数据集来说,使用 Google 可视化查询 是最快的。


0
投票
How to append this data in my sheet after using the method 3

when I am using it
data.rows[2].c[0].v // saying c is not defined how to resolve this issue?





var data =  "rows": [
{
"c": [
{
"v": "91880d426361477 _ 7-26-2023, 12_57_52 PM.aac"
},
{
"v": "91884024661477"
},
{
"v": "https://drive.google.com/file/d/1H11dizVaqga9VsdeEFVKpdfwbaXPWp1yxsy3Nfd/view?usp=drivesdk"
},
{
"v": "Tasleem"
},
{
"v": "7-26-2023 12:57:52 PM"
},
{
"v": "1H11izdsfVaqga9VeEsdfFVKpwbaXdfPWp1sdfyxsy3N"
},
{
"v": "https://docs.google.com/uc?export=download&id=1H11izVdsfaqga9VeEFsdfVKpwbaXPWp1dsfyxsy3N"
}
]
},
{
"c": [
{
"v": "91854026641477 _ 7-12-2023, 12_19_25 PM.aac"
},
{
"v": "918802664561477"
},
{
"v": "https://drive.google.com/file/d/1ZVX92zTasdfsdoMjEpSY0Dr-yAnz0B1W8dfnJkH/view?usp=drivesdk"
},
{
"v": "Tasleem"
},
{
"v": "7-12-2023 12:19:25 PM"
},
{
"v": "1ZVX92zTadsfoMjEpSY0Dsdfr-yAnz0Bdf1W8nJkH"
},
{
"v": "https://docs.google.com/uc?export=download&id=1ZVX92zTaosdfsdMjEpSfdY0Dr-yAnz0Bfsdsd1W8nJkH"
}
]
}
© www.soinside.com 2019 - 2024. All rights reserved.