我需要获取代码和产品标题。我尝试了这个,但它返回“无法读取 null 的属性‘innerText’”
HTML
<th data-code="XXXXXX">
<div>
<div id="title"><span>Product title</span></div>
</div>
</th>
<th data-code="XXXXXX">
<div>
<div id="title"><span>Product title</span></div>
</div>
</th>
傀儡师
let table = await page.$$eval(
'table th',
divs => divs.map((div, index) => ({
title: div.querySelector('#title').innerText,
code: div.dataset.code
})
)
);
假设 HTML 片段有一个
<table>
包装器,您现有的代码对我来说工作得很好:
const puppeteer = require("puppeteer"); // ^22.6.0
const html = `
<table>
<th data-code="XXXXXX">
<div>
<div id="title"><span>Product title</span></div>
</div>
</th>
<th data-code="XXXXXX 2">
<div>
<div id="title"><span>Product title 2</span></div>
</div>
</th>
</table>`;
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.setContent(html);
// your exact code:
let table = await page.$$eval(
'table th',
divs => divs.map((div, index) => ({
title: div.querySelector('#title').innerText,
code: div.dataset.code
})
)
);
console.log(table);
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
输出:
[
{ title: 'Product title', code: 'XXXXXX' },
{ title: 'Product title 2', code: 'XXXXXX 2' }
]
该网站可能有影子 DOM、iframe、任意 JS 行为、cloudflare 块、A/B 测试或其他一些缓解因素,因此请分享一个包含实际页面的最小示例以获得进一步帮助。