同步运行HTTP查询循环

Question

我正在编写一些Node.js代码来抓取不希望被抓取的网站。我需要检索大约3500个项目的信息。我可以在他们的搜索页面上手动进行3500次搜索，然后剪切并粘贴，但这并不有趣，我可能需要每隔几周重复一次。我写了一些有效的代码，但是当我运行测试时，我意识到节点js是异步的（duh），如果我运行完整的数据集，它看起来就像是DoS攻击。因此，在尝试了各种技术之后，我决定使用“ eachOfSeries”，因为文档中说：“一次只运行一个异步操作”。但是我的结果以随机顺序返回，这使我相信查询是并行运行的。那么（1）我的查询一次运行一次吗？（2）如何修复我的代码，以使查询不会在前一个查询完成之前启动。我不在乎阻塞或效率。实际上我可能甚至想在两次查询之间设置延迟。我碰到过异步瀑布。那么（3）这是做我需要的好方法吗？这是一个显示我如何搜索的测试：

const fetch = require('node-fetch');
const async = require('async');
const readline = require('readline');
const fs = require('fs');
const readInterface = readline.createInterface({
    input: fs.createReadStream('ISBNtest.txt'),
//    output: process.stdout,
    console: false
});

function runit() {

    var d = new Date();

//  const data = fs.readFileSync('search_keys.txt', 'UTF-8');
    const data = 'async\nkey2\nkey3\neachOfSeries';  // for testing
// split the contents by new line
    const lines = data.split(/\r?\n/);
    var linesNum = lines.length;
    console.log('Lines '+linesNum);

    async.eachOfSeries(lines, function(Key, index, callback){
    fetch('https://github.com/search/?q='+Key+'&ref=simplesearch')
        .then(res => res.text())
        .then(body => proc(Key,index,body))
        .then(callback(null, null))
        .catch(err => console.log('error0 '+err))
        ;
    }, 
    function(err, results){
    if(err){
        console.error('error1 '+err);
    } else {
        console.log('all Keys processed.');
    }
    });
}
runit();
function proc(Key,index,body) {
    //  code here will scrape the HTML for data I need

    var d = new Date().toLocaleTimeString();
    console.log(d+" "+index+" "+Key+" "+body.substring(20,30))
}

Answer 1

我应该在寻找解决方案方面做得更好。我发现Nodejs Synchronous For each loop可以很好地使用递归来解决我的问题。

同步运行HTTP查询循环

问题描述投票：-1回答：1

1个回答

最新问题

同步运行HTTP查询循环

问题描述 投票：-1回答：1

1个回答

最新问题

问题描述投票：-1回答：1