我第一次尝试在cheerio库中从paginas amarillas进行一些Web抓取操作,例如公司名称,地址等。该地址位于没有类的范围内,只有[C0 ],我尝试了不同的方法,因为到达该选择器是一个很长的字符串,我一直获取数据,直到itemprop之前的那个,但是我不知道如何瞄准itemprop选择器,我遇到的麻烦是pathAddreses常量,它将在控制台日志上返回一个空数组,如果我删除字符串的最后一个元素(itemprop="streetAddres"
),它将带给我数据,但不完全是我想要的那个]
这里是代码:
itemprop='streetAddres'
如果只想使用const cheerio = require("cheerio");
const request = require("request-promise");
//const of the classes of the paginas amarillas elements we are aiming to
const pathProfesionals1 =
".container .row .first-content-listado .col-lg-7.col-md-8.col-xs-12 .bloque-central .central .listado-item.item-ip .box .cabecera .row .col-xs-11.comercial-nombre a h2 span";
const pathProfesionals2 =
".container .row .first-content-listado .col-lg-7.col-md-8.col-xs-12 .bloque-central .central .listado-item.item-ig .box .cabecera .row .col-xs-11.comercial-nombre a h2 span";
const pathTelephones1 =
".container .row .first-content-listado .col-lg-7.col-md-8.col-xs-12 .bloque-central .central .listado-item.item-ip .box .pie-pastilla .row .col-xs-4 a.llama-desplegable.btn.btn-amarillo.btn-block.phone.hidden.d-none span";
const pathTelephones2 =
".container .row .first-content-listado .col-lg-7.col-md-8.col-xs-12 .bloque-central .central .listado-item.item-ig .box .pie-pastilla .row .col-xs-4 a.llama-desplegable.btn.btn-amarillo.btn-block.phone.hidden.d-none span";
const pathAddress1 = `.container .row .first-content-listado .col-lg-7.col-md-8.col-xs-12 .bloque-central .central .listado-item.item-ig .box .row a .location span *[itemprop = 'streetAddress']`;
const pathAddress2 = "";
init = async () => {
const arrCompanyName = [];
const arrTelephones = [];
const arrAddresses = [];
const { category, city } = this.state;
const $ = await request({
uri: `https://www.paginasamarillas.es/search/${category}/all-ma/${city}/all-is/malaga/all-ba/all-pu/all-nc/1?what=carpintero&where=malaga&ub=false&qc=true`,
transform: body => cheerio.load(body) //una vez hago la peticion lo paso a cheerio para que lo analice
});
const profesionals1 = $(pathProfesionals1).each((i, el) =>
arrCompanyName.push($(el).text())
);
const telephones1 = $(pathTelephones1).each((i, el) =>
arrTelephones.push($(el).text())
);
const profesionals2 = $(pathProfesionals2).each((i, el) =>
arrCompanyName.push($(el).text())
);
const telephones2 = $(pathTelephones2).each((i, el) =>
arrTelephones.push($(el).text())
);
const addresses1 = $(pathAddress1).each((i, el) =>
arrAddresses.push($(el).text())
);
console.log(arrAddresses);
}
属性选择span
标签,则需要从地址选择器中删除itemprop='streetAddress'
。目前,Cheerio尝试使用*
inside and not with