抓取谷歌词典

问题描述 投票:0回答:3

我正在尝试抓取 Google 词典并创建一个非官方 API。我尝试使用 Node.js 的 Cheerio 和 request 包来实现此功能。

这是我的代码:

var cheerio = require("cheerio");
var request = require('request');

request({
    method: 'GET',
    url: 'https://www.google.co.in/search?q=define+love'
}, function(err, response, body) {

    if(err){
        return console.error(err)
    }


    var $ = cheerio.load(body);

    var a = $(".vk_ans span").text();
    console.log(a);

});

我最初尝试废弃此页面“https://www.google.co.in/search?q=define+love”,我尝试废弃粗体的love,它是在一个跨度中写的在

vk_ans
班的 div 中。

但是当我

console.log
得到答案时,它会导致空行,所有其他地方我都在做同样的事情,并且 Cheerio 运行良好。我错过了什么?

javascript node.js web-scraping user-agent cheerio
3个回答
2
投票

您需要一个用户代理标头,以免被识别为机器人。 试试这个:

var cheerio = require("cheerio");
var request = require('request');

request({
  method: 'GET',
  url: 'https://www.google.co.in/search?q=define+love',
  headers: {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"
  }
}, function(err, response, body) {

  if (err) {
    return console.error(err);
  }

  var $ = cheerio.load(body);

  var a = $(".mw").text();
  console.log(a);

});

1
投票

此外,您还可以使用 SerpApi 中的 Google Direct Answer Box API。 SerpApi 是一个免费 API,每月有 100 次搜索。如果您需要更多搜索,可以选择付费计划。

不同之处在于,需要做的只是迭代现成的结构化 JSON,而不是从头开始编码所有内容,弄清楚如何绕过 Google 的阻止,并选择正确的选择器,这有时可能非常耗时。 看看游乐场

完整代码(在线 IDE 中):

const SerpApi = require("google-search-results-nodejs");
const search = new SerpApi.GoogleSearch(process.env.API_KEY); //your api key from serpapi.com

const searchString = "define love";                         // what we want to search

const params = {
  engine: "google",                                         // search engine
  q: searchString,                                          // search query
  google_domain: "google.com",                              // google domain of the search
  gl: "us",                                                 // parameter defines the country to use for the Google search
  hl: "en",                                                 // Parameter defines the language to use for the Google search
};

const getAnswerBoxData = ({ answer_box }) => {
  return answer_box.syllables;
};

const getJson = (params) => {
  return new Promise((resolve) => {
    search.json(params, resolve);
  });
};

getJson(params).then(getAnswerBoxData).then(console.log);

输出:

love

完整定义 SerpApi 答案框输出:

{
   "type":"dictionary_results",
   "syllables":"love",
   "word_type":"noun",
   "definitions":[
      "an intense feeling of deep affection.",
      "a great interest and pleasure in something.",
      "feel deep affection for (someone).",
      "like or enjoy very much."
   ],
   "extras":[
      "deep affection",
      "fondness",
      "tenderness",
      "warmth",
      "intimacy",
      "attachment",
      "endearment",
      "devotion",
      "adoration",
      "doting",
      "idolization",
      "worship",
      "passion",
      "ardor",
      "desire",
      "lust",
      "yearning",
      "infatuation",
      "adulation",
      "besottedness",
      "compassion",
      "care",
      "caring",
      "regard",
      "solicitude",
      "concern",
      "friendliness",
      "friendship",
      "kindness",
      "charity",
      "goodwill",
      "sympathy",
      "kindliness",
      "altruism",
      "philanthropy",
      "unselfishness",
      "benevolence",
      "brotherliness",
      "sisterliness",
      "fellow feeling",
      "humanity",
      "relationship",
      "love affair",
      "affair",
      "romance",
      "liaison",
      "affair of the heart",
      "intrigue",
      "amour",
      "hatred",
      "liking",
      "weakness",
      "partiality",
      "bent",
      "leaning",
      "proclivity",
      "inclination",
      "disposition",
      "enjoyment",
      "appreciation",
      "soft spot",
      "taste",
      "delight",
      "relish",
      "passion",
      "zeal",
      "appetite",
      "zest",
      "enthusiasm",
      "keenness",
      "predilection",
      "penchant",
      "fondness",
      "be in love with",
      "be infatuated with",
      "be smitten with",
      "be besotted with",
      "be passionate about",
      "care very much for",
      "feel deep affection for",
      "hold very dear",
      "adore",
      "think the world of",
      "be devoted to",
      "dote on",
      "cherish",
      "worship",
      "idolize",
      "treasure",
      "prize",
      "be mad/crazy/nuts/wild about",
      "have a pash on",
      "carry a torch for",
      "be potty about",
      "hate",
      "loathe",
      "detest",
      "like very much",
      "delight in",
      "enjoy greatly",
      "have a passion for",
      "take great pleasure in",
      "derive great pleasure from",
      "have a great liking for",
      "be addicted to",
      "relish",
      "savor",
      "have a weakness for",
      "be partial to",
      "have a soft spot for",
      "have a taste for",
      "be taken with",
      "have a predilection for",
      "have a proclivity for",
      "have a penchant for",
      "get a kick from/out of",
      "have a thing about/for",
      "be mad for/about",
      "be crazy/nuts/wild about",
      "be hooked on",
      "get off on",
      "get a buzz from/out of",
      "be potty about",
      "go a bundle on"
   ]
}

免责声明,我为 SerpApi 工作


0
投票

我发现这很有趣

https://www.google.com/advanced_search 从这里你可以像这样使用 url,你只需要使用该语言,并将“定义”添加到搜索中

https://www.google.com/search?as_q=&as_epq=define+love&as_oq=&as_eq=&as_nlo=&as_nhi=&lr=lang_pt&cr=&as_qdr=all&as_sitesearch=&as_occt=any&as_filetype=&tbs=

© www.soinside.com 2019 - 2024. All rights reserved.