尽管滚动到视图中,Selenium 仍找不到元素

问题描述 投票:0回答:1

我正在尝试从网站上抓取一些内容(代码中的网址)。我能够抓取品牌名称和 SDR,但似乎任何低于 SDR 的内容,我似乎都无法抓取。我只在第一个结果上进行测试,一旦我设法弄清楚它,我就会使它变得动态。希望人们只需要在他们的项目和 chrome 派生程序中包含 selenium,然后他们就可以复制/粘贴此代码。

以下代码给出以下错误:

Exception in thread "main" org.openqa.selenium.TimeoutException: Expected condition failed: waiting for visibility of element located by By.xpath: /html/body/app-root/ecl-app/div[2]/app-search-page/app-search-container/div/div/section/div/app-elec-display-search-result/app-search-result/eui-block-content/div/app-search-result-item[1]/article/div[2]/div/app-elec-display-search-result-parameters/app-search-parameter-item[4]/div[2]/div/div[2]/div/div[1]/span (tried for 10 second(s) with 500 milliseconds interval)

代码:

public void scrape() throws InterruptedException {
    System.out.println("Starting Scrape!");
    String url = "https://eprel.ec.europa.eu/screen/product/electronicdisplays";

    WebDriver driver = new ChromeDriver();
    WebDriverWait wait = new WebDriverWait(driver, Duration.ofSeconds(10));
    driver.get(url);
    driver.manage().window().maximize();
    WebElement until = wait.until(ExpectedConditions.presenceOfElementLocated(By.className("eui-block-content__wrapper")));
    //The results have been loaded now

    //Click on accept cookie page:
    new WebDriverWait(driver, Duration.ofSeconds(3
    )).until(ExpectedConditions.elementToBeClickable(By.linkText("Accept all cookies"))).click();

    String moreButton = "/html/body/app-root/ecl-app/div[2]/app-search-page/app-search-container/div/div/section/div/app-elec-display-search-result/app-search-result/eui-block-content/div/app-search-result-item[1]/article/div[3]/div/a";
    String xPathBrandName =     "/html/body/app-root/ecl-app/div[2]/app-search-page/app-search-container/div/div/section/div/app-elec-display-search-result/app-search-result/eui-block-content/div/app-search-result-item[1]/article/div[1]/div/div/div[1]/span[1]";
    String xPathSDR =           "/html/body/app-root/ecl-app/div[2]/app-search-page/app-search-container/div/div/section/div/app-elec-display-search-result/app-search-result/eui-block-content/div/app-search-result-item[1]/article/div[2]/div/app-elec-display-search-result-parameters/app-search-parameter-item[3]/div[1]/div/div[2]/div/div[1]/span";
    String energyRatingString = "/html/body/app-root/ecl-app/div[2]/app-search-page/app-search-container/div/div/section/div/app-elec-display-search-result/app-search-result/eui-block-content/div/app-search-result-item[1]/article/div[2]/div/app-elec-display-search-result-parameters/app-search-parameter-item[4]/div[2]/div/div[2]/div/div[1]/span";

    //Clicking on more button to load more results to be visible
    driver.findElement(By.xpath(moreButton)).click();

    WebElement SDR = driver.findElement(By.xpath(xPathSDR));

    //Using this logic to scroll to each of the result so it's visible on the web-page
    JavascriptExecutor js = (JavascriptExecutor) driver;
    js.executeScript("arguments[0].scrollIntoView();", SDR);

    WebElement brandName = driver.findElement(By.xpath(xPathBrandName));
    WebElement energyRating = wait.until(ExpectedConditions.visibilityOfElementLocated(By.xpath(energyRatingString)));

    System.out.println("Brand name: " + brandName.getText());
    System.out.println("SDR name: " + SDR.getText());
    System.out.println("energyRating: " + energyRating.getText());
}

但是切换到替换

WebElement energyRating = wait.until(ExpectedConditions.visibilityOfElementLocated(By.xpath(energyRatingString)));

WebElement energyRating = driver.findElement(By.xpath(energyRatingString ));

给出以下输出:

Starting Scrape!
Brand name: Samsung
SDR name: 63
energyRating: 

所以我很困惑为什么 energyRating 缺失并且没有给出 NoSuchElementException

java selenium-webdriver selenium-chromedriver browser-automation
1个回答
0
投票

您遇到的问题是每个字段都有 2 个,一个可见,一个隐藏。您的 XPath 指向隐藏元素之一,因为它永远不可见,所以等待超时。

我编写了自己的代码来完成您所描述的任务。

String url = "https://eprel.ec.europa.eu/screen/product/electronicdisplays";

driver = new ChromeDriver();
driver.manage().window().maximize();
driver.get(url);

List<WebElement> results = driver.findElements(By.cssSelector("app-search-result-item"));
String brandName = "";
String sdr = "";
String energyRating = "";
for (WebElement result : results) {
    result.findElement(By.xpath("//a[text()=' More ']")).click();
    brandName = result.findElement(By.cssSelector("span.ecl-u-type-2xl")).getText();
    sdr = result.findElement(By.cssSelector("app-search-parameter-item[label='field.electronic-display.powerOnModeSDRV2'] div.ecl-u-d-l-block span.ecl-u-type-bold")).getText();
    energyRating = result.findElement(By.cssSelector("app-search-parameter-item[label='field.electronic-display.energyClassHDR'] div.ecl-u-d-l-block span.ecl-u-type-bold")).getText();
    result.findElement(By.xpath("//a[text()=' Less ']")).click();

    System.out.println("Brand name: " + brandName);
    System.out.println("SDR name: " + sdr);
    System.out.println("energyRating: " + energyRating);
}

它输出

Brand name: Samsung
SDR name: 63
energyRating: G
Brand name: Samsung
SDR name: 63
energyRating: G
...

一些反馈...

  1. 绝对 XPath(从
    /html
    开始的路径)、过长(多个元素级别)和索引(
    /div[2]
    等)都是有风险的,因为对页面的最小更改都会破坏您的定位器。我确信您是新手,这是最好的起点,但如果您打算继续编写脚本,学习编写自己的定位器将非常有价值。
  2. 当您要使用重复部分抓取这样的页面时,请找到包含每个部分的最上面的元素,例如
    <app-search-result-item>
    。将这些内容放在一个列表中,您可以对其进行迭代,这将使此类任务变得更容易。在每个循环中,您从该锚元素开始搜索,以便只查找适用于该产品的数据等。这就是为什么在我的代码中您会看到很多
    result.findElement()
    ,因为
    result
    是列表中的产品我循环浏览的产品。
  3. 请注意,我从未滚动过页面。一般来说,您不需要滚动页面...Selenium 会为您处理这些事情。
  4. 虽然
    WebDriverWait
    是一个很好的做法,但它们并不总是必需的。
© www.soinside.com 2019 - 2024. All rights reserved.