多个选择器上的 Puppeteer waitForSelector

问题描述 投票:0回答:11

我使用 Puppeteer 控制一个带有查找表单的网站,该表单可以返回结果或“未找到记录”消息。我怎么知道哪个被退回了? waitForSelector 似乎一次只等待一个,而 waitForNavigation 似乎不起作用,因为它是使用 Ajax 返回的。 我正在使用 try catch,但是很难做到正确并且会减慢一切。

try {
    await page.waitForSelector(SELECTOR1,{timeout:1000}); 
}
catch(err) { 
    await page.waitForSelector(SELECTOR2);
}
javascript puppeteer screen-scraping
11个回答
32
投票

使任意元素存在

您可以同时使用

querySelectorAll
waitForFunction
来解决这个问题。使用带有逗号的所有选择器将返回与任何选择器匹配的所有节点。

await page.waitForFunction(() => 
  document.querySelectorAll('Selector1, Selector2, Selector3').length
);

现在,如果有某个元素,它只会返回

true
,它不会返回哪个选择器与哪些元素匹配。


22
投票

如何使用

Promise.race()
就像我在下面的代码片段中所做的那样,并且不要忘记
{ visible: true }
方法中的
page.waitForSelector()
选项。

public async enterUsername(username:string) : Promise<void> {
    const un = await Promise.race([
        this.page.waitForSelector(selector_1, { timeout: 4000, visible: true })
        .catch(),
        this.page.waitForSelector(selector_2, { timeout: 4000, visible: true })
        .catch(),
    ]);

    await un.focus();
    await un.type(username);
}

12
投票

我认为解决这个问题的最佳方法是从更基于 CSS 的角度出发。

waitForSelector
似乎遵循 CSS 选择器列表规则。因此本质上您只需使用逗号即可选择多个 CSS 元素。

try {    
    await page.waitForSelector('.selector1, .selector2',{timeout:1000})
} catch (error) {
    // handle error
}

9
投票

根据Md. Abu Taher的建议,我最终得到了这个:

// One of these SELECTORs should appear, we don't know which
await page.waitForFunction((sel) => { 
    return document.querySelectorAll(sel).length;
},{timeout:10000},SELECTOR1 + ", " + SELECTOR2); 

// Now see which one appeared:
try {
    await page.waitForSelector(SELECTOR1,{timeout:10});
}
catch(err) {
    //check for "not found" 
    let ErrMsg = await page.evaluate((sel) => {
        let element = document.querySelector(sel);
        return element? element.innerHTML: null;
    },SELECTOR2);
    if(ErrMsg){
        //SELECTOR2 found
    }else{
        //Neither found, try adjusting timeouts until you never get this...
    }
};
//SELECTOR1 found

8
投票

在 puppeteer 中,您可以简单地使用用逗号分隔的多个选择器,如下所示:

const foundElement = await page.waitForSelector('.class_1, .class_2');

返回的元素将是页面中找到的第一个元素的elementHandle。

接下来,如果您想知道找到了哪个元素,您可以像这样获取类名称:

const className = await page.evaluate(el => el.className, foundElement);

在您的情况下,类似于此的代码应该可以工作:

const foundElement = await page.waitForSelector([SELECTOR1,SELECTOR2].join(','));
const responseMsg = await page.evaluate(el => el.innerText, foundElement);
if (responseMsg == "No records found"){ // Your code here }

6
投票

我遇到了类似的问题,并采用了这个简单的解决方案:

helpers.waitForAnySelector = (page, selectors) => new Promise((resolve, reject) => {
  let hasFound = false
  selectors.forEach(selector => {
    page.waitFor(selector)
      .then(() => {
        if (!hasFound) {
          hasFound = true
          resolve(selector)
        }
      })
      .catch((error) => {
        // console.log('Error while looking up selector ' + selector, error.message)
      })
  })
})

然后使用它:

const selector = await helpers.waitForAnySelector(page, [
  '#inputSmsCode', 
  '#buttonLogOut'
])

if (selector === '#inputSmsCode') {
  // We need to enter the 2FA sms code. 
} else if (selector === '#buttonLogOut') {
  // We successfully logged in
}

2
投票

进一步使用

Promise.race()
,将其包装起来并检查索引以获取进一步的逻辑:

// Typescript
export async function racePromises(promises: Promise<any>[]): Promise<number> {
  const indexedPromises: Array<Promise<number>> = promises.map((promise, index) => new Promise<number>((resolve) => promise.then(() => resolve(index))));
  return Promise.race(indexedPromises);
}
// Javascript
export async function racePromises(promises) {
  const indexedPromises = promises.map((promise, index) => new Promise((resolve) => promise.then(() => resolve(index))));
  return Promise.race(indexedPromises);
}

用途:

const navOutcome = await racePromises([
  page.waitForSelector('SELECTOR1'),
  page.waitForSelector('SELECTOR2')
]);
if (navigationOutcome === 0) {
  //logic for 'SELECTOR1'
} else if (navigationOutcome === 1) {
  //logic for 'SELECTOR2'
}



1
投票

如果你想等待多个选择器中的第一个并获取匹配的元素,你可以从

waitForFunction
开始:

const matches = await page.waitForFunction(() => {
  const matches = [...document.querySelectorAll(YOUR_SELECTOR)];
  return matches.length ? matches : null;
});

waitForFunction
将返回一个 ElementHandle 但不是它们的数组。如果您只需要本机 DOM 方法,则无需获取句柄。例如,要从此数组中获取文本:

const contents = await matches.evaluate(els => els.map(e => e.textContent));

换句话说,

matches
的行为很像 Puppeteer 传递给
$$eval
的数组。

另一方面,如果您确实需要句柄数组,以下演示代码将进行转换并显示正常使用的句柄:

const puppeteer = require("puppeteer"); // ^16.2.0

const html = `
<!DOCTYPE html>
<html>
<head>
<style>
h1 {
  display: none;
}
</style>
</head>
<body>
<script>
setTimeout(() => {

  // add initial batch of 3 elements
  for (let i = 0; i < 3; i++) {
    const h1 = document.createElement("button");
    h1.textContent = \`first batch #\${i + 1}\`;
    h1.addEventListener("click", () => {
      h1.textContent = \`#\${i + 1} clicked\`;
    });
    document.body.appendChild(h1);
  }

  // add another element 1 second later to show it won't appear in the first batch
  setTimeout(() => {
    const h1 = document.createElement("h1");
    h1.textContent = "this won't be found in the first batch";
    document.body.appendChild(h1);
  }, 1000);

}, 3000); // delay before first batch of elements are added
</script>
</body>
</html>
`;

let browser;
(async () => {
  browser = await puppeteer.launch({headless: true});
  const [page] = await browser.pages();
  await page.setContent(html);

  const matches = await page.waitForFunction(() => {
    const matches = [...document.querySelectorAll("button")];
    return matches.length ? matches : null;
  });
  const length = await matches.evaluate(e => e.length);
  const handles = await Promise.all([...Array(length)].map((e, i) =>
    page.evaluateHandle((m, i) => m[i], matches, i)
  ));
  await handles[1].click(); // show that the handles work
  const contents = await matches.evaluate(els => els.map(e => e.textContent));
  console.log(contents);
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close())
;

不幸的是,它有点冗长,但这可以做成一个助手。

如果您有兴趣集成 {visible: true} 选项,另请参阅

等待多个元素匹配选择器中的第一个可见。


0
投票
将上面的一些元素组合到辅助方法中,我构建了一个命令,该命令允许我创建多个可能的选择器结果,并处理第一个要解析的结果。

/** * @typedef {import('puppeteer').ElementHandle} PuppeteerElementHandle * @typedef {import('puppeteer').Page} PuppeteerPage */ /** Description of the function @callback OutcomeHandler @async @param {PuppeteerElementHandle} element matched element @returns {Promise<*>} can return anything, will be sent to handlePossibleOutcomes */ /** * @typedef {Object} PossibleOutcome * @property {string} selector The selector to trigger this outcome * @property {OutcomeHandler} handler handler will be called if selector is present */ /** * Waits for a number of selectors (Outcomes) on a Puppeteer page, and calls the handler on first to appear, * Outcome Handlers should be ordered by preference, as if multiple are present, only the first occuring handler * will be called. * @param {PuppeteerPage} page Puppeteer page object * @param {[PossibleOutcome]} outcomes each possible selector, and the handler you'd like called. * @returns {Promise<*>} returns the result from outcome handler */ async function handlePossibleOutcomes(page, outcomes) { var outcomeSelectors = outcomes.map(outcome => { return outcome.selector; }).join(', '); return page.waitFor(outcomeSelectors) .then(_ => { let awaitables = []; outcomes.forEach(outcome => { let await = page.$(outcome.selector) .then(element => { if (element) { return [outcome, element]; } return null; }); awaitables.push(await); }); return Promise.all(awaitables); }) .then(checked => { let found = null; checked.forEach(check => { if(!check) return; if(found) return; let outcome = check[0]; let element = check[1]; let p = outcome.handler(element); found = p; }); return found; }); }

要使用它,您只需调用并提供一组可能的结果及其选择器/处理程序:

await handlePossibleOutcomes(page, [ { selector: '#headerNavUserButton', handler: element => { console.log('Logged in',element); loggedIn = true; return true; } }, { selector: '#email-login-password_error', handler: element => { console.log('password error',element); return false; } } ]).then(result => { if (result) { console.log('Logged in!',result); } else { console.log('Failed :('); } })
    

0
投票
我刚刚开始使用

Puppeteer,并且遇到了同样的问题,因此我想制作一个满足相同用例的自定义函数。

功能如下:

async function waitForMySelectors(selectors, page){ for (let i = 0; i < selectors.length; i++) { await page.waitForSelector(selectors[i]); } }
函数中的第一个参数接收选择器数组,第二个参数是我们在其中执行等待过程的页面。

调用函数如下例:

var SelectorsArray = ['#username', '#password']; await waitForMySelectors(SelectorsArray, page);
虽然我还没有对其进行任何测试,但它看起来很实用。


-1
投票
Puppeteer 方法如果无法满足请求,可能会抛出错误。例如,如果选择器在给定时间范围内未匹配任何节点,则 page.waitForSelector(selector[, options]) 可能会失败。

对于某些类型的错误,Puppeteer 使用特定的错误类。这些类可通过 require('puppeteer/Errors') 获得。

支持的类列表:

超时错误

处理超时错误的示例:

const {TimeoutError} = require('puppeteer/Errors'); // ... try { await page.waitForSelector('.foo'); } catch (e) { if (e instanceof TimeoutError) { // Do something if this is a timeout. } }
    
© www.soinside.com 2019 - 2024. All rights reserved.