driver.findelements(By.xpath)使用Selenium Java在https://www.amazon.com/上显示不一致的搜索结果

问题描述 投票:0回答:2

我正在尝试获取此亚马逊页面前3页上出售的每台笔记本电脑的URL

URL:https://www.amazon.com/s?i=computers&rh=n%3A565108%2Cp_72%3A1248879011&pf_rd_i=565108&pf_rd_p=b2e34a42-7eb2-50c2-8561-292e13c797df&pf_rd_r=CP4KYB71SY8E0WPHYJYA&pf_rd_s=merchandised-search-11&pf_rd_t=BROWSE&qid=1590091272&ref=sr_pg_1

每次运行脚本时,driver.findElements(By.xpath)都会返回数量不一致的URL。第一页非常一致,它返回4个URL,但是即使第2页有我正在寻找的8个URL,而第3页有我正在寻找的4个URL,第2页和第3页也可以返回1-4个URL之间的任意位置。

我怀疑问题出在grabData方法中,因为它基于给定的URL列表来获取数据。我对此很陌生,所以我希望一切都说得通。任何帮助,将不胜感激。让我知道是否需要进一步说明

public static String dealURLsXpath = "//span[@data-a-strike=\"true\" or contains(@class,\"text-strike\")][.//text()]/parent::a[@class]";
public static List<String> URLs = new ArrayList<String>();


public static void main(String[] args)
    {       
        //Initialize Browser
        System.setProperty("webdriver.chrome.driver", "C:\\Users\\email\\eclipse-workspace\\ChromeDriver 81\\chromedriver.exe");
        WebDriver driver = new ChromeDriver();
        driver.manage().timeouts().implicitlyWait(5, TimeUnit.SECONDS);


        //Search through laptops and starts at page 1
        Search.searchLaptop(driver);
        //Grabs data for each deal and updates Products List directly
        listingsURL = driver.getCurrentUrl();
        //updates the global URLs List with the URLs found by driver.findElements(By.xpath)
        updateURLsList(driver);
        //Iterates through each URL and grabs laptop information to add to products list
        grabData(driver, URLs, "Laptop");
        // Clears URLs list so that it can be populated by the URLs in the next page
        URLs.clear();
        // returns driver to Amazon page to click on "page 2" button to go to next page and repeat process
        driver.get(listingsURL);

        driver.findElement(By.xpath("//a [contains(@href,'pg_2')]")).click();

        listingsURL = driver.getCurrentUrl();
        updateURLsList(driver);
        grabData(driver, URLs, "Laptop");
        URLs.clear();
        driver.get(listingsURL);

        driver.findElement(By.xpath("//a [contains(@href,'pg_3')]")).click();

        listingsURL = driver.getCurrentUrl();
        updateURLsList(driver);
        grabData(driver, URLs, "Laptop");
        URLs.clear();
        driver.get(listingsURL);
    }

public static void updateURLsList(WebDriver driver)
    {
        //list of deals on amazon page
/////////////////////////////////////////////INCONSISTENT/////////////////////////////////////////////
        List<WebElement> deals = driver.findElements(By.xpath(dealURLsXpath));
//////////////////////////////////////////////////////////////////////////////////////////////////////

        System.out.println("Deals Size: " + deals.size());
        for(WebElement element : deals)
        {
            URLs.add(element.getAttribute("href"));
        }
        System.out.println("URL List size: " + URLs.size());
        deals.clear();
    }
public static void grabData(WebDriver driver, List<String> URLs, String category)
    {
        for(String url : URLs)
        {
            driver.get(url);
            String name = driver.findElement(By.xpath("//span [@id = \"productTitle\"]")).getText();
            System.out.println("Name: " + name);
            String price = driver.findElement(By.xpath("//span [@id = \"priceblock_ourprice\"]")).getText();
            System.out.println("price: " + price);
            String Xprice = driver.findElement(By.xpath("//span [@class = \"priceBlockStrikePriceString a-text-strike\"]")).getText();
            System.out.println("Xprice: " + Xprice);
            String picURL = driver.findElement(By.xpath("//img [@data-old-hires]")).getAttribute("src");
            System.out.println("picURL: " + picURL);

            BufferedImage img;

            System.out.println("URL: " + url);

            try
            {
                img = ImageIO.read(new URL(picURL));
                products.add(new Product(
                        name, 
                        Integer.parseInt(price.replaceAll("[^\\d.]", "").replace(".", "").replace(",", "")), 
                        Integer.parseInt(Xprice.replaceAll("[^\\d.]", "").replace(".", "").replace(",", "")), 
                        img, 
                        category, 
                        url));
            }
            catch(IOException e)
            {
                System.out.println("Error: " + e.getMessage());
            }




        }
java selenium selenium-webdriver xpath webdriverwait
2个回答
0
投票

您应该尝试以硒方式使用wait:

WebDriverWait wait = new WebDriverWait(driver, 20);
List<WebElement> deals = wait.until(ExpectedConditions.presenceOfAllElementsLocatedBy(By.xpath(dealURLsXpath)));

0
投票

要获取此Amazon page的前三页上出售的每台笔记本电脑的href属性,您需要为visibilityOfAllElementsLocatedBy()引入WebDriverWait,然后可以使用以下[C0 ]:

  • 代码块:

    Locator Strategy
  • 控制台输出:

    driver.get("https://www.amazon.com/s?i=computers&rh=n%3A565108%2Cp_72%3A1248879011&pf_rd_i=565108&pf_rd_p=b2e34a42-7eb2-50c2-8561-292e13c797df&pf_rd_r=CP4KYB71SY8E0WPHYJYA&pf_rd_s=merchandised-search-11&pf_rd_t=BROWSE&qid=1590091272&ref=sr_pg_1");
    List<WebElement> deals = new WebDriverWait(driver, 20).until(ExpectedConditions.visibilityOfAllElementsLocatedBy(By.xpath("//span[@class='a-price a-text-price']//parent::a[1]")));
    for(WebElement deal:deals)
        System.out.println(deal.getAttribute("href"));
    

类似地,页2分别提供4页3分别提供4个网址。

© www.soinside.com 2019 - 2024. All rights reserved.