JXL更改基于JSOUP的应用程序的列

问题描述 投票:0回答:1

目前,该程序将运行一列URL并将所选数据输出到相邻单元。我可以设置它开始的列,但这就是我所能做的。现在,我只在一列上工作。我怎样才能指示它说第4栏(E栏)并在第0栏(A)通过后自上而下?然后可能是另一个,在那之后说专栏J?

我相信我的问题在于“while(!(cell = sheet.getCell ...”行),但我不确定在不破坏程序的情况下要改变什么。

我的代码如下:

public class App {

private static final int URL_COLUMN = 0; // Column A
private static final int PRICE_COLUMN = 1; //Column B

public static void main(final String[] args) throws Exception {

    Workbook originalWorkbook = Workbook.getWorkbook(new File("C:/Users/Shadow/Desktop/original.xls"));
    WritableWorkbook workbook = Workbook.createWorkbook(new File("C:/Users/Shadow/Desktop/updated.xls"), originalWorkbook);
    originalWorkbook.close();
    WritableSheet sheet = workbook.getSheet(0);
    int currentRow = 1;
    Cell cell;

    while (!(cell = sheet.getCell(URL_COLUMN, currentRow)).getType().equals(CellType.EMPTY)) {

        String url = cell.getContents();
        System.out.println("Checking URL: " + url);
        if (url.contains("scrapingsite1.com")) {
            String Price = ScrapingSite1(url);
            System.out.println("Scraping Site1's Price: " + Price);
            Label cellWithPrice = new Label(PRICE_COLUMN, currentRow, Price);
            sheet.addCell(cellWithPrice);
        }
        currentRow++;
    }
    workbook.write();
    workbook.close();
}

private static String ScrapingSite1 (String url) throws IOException {
    Document doc = null;
    for (int i=1; i <= 6; i++) {
        try {
            doc = Jsoup.connect(url).userAgent("Mozilla/5.0").timeout(6000).validateTLSCertificates(false).get();
            break;
        } catch (IOException e) {
            System.out.println("Jsoup issue occurred " + i + " time(s).");
        }
    }
    if (doc == null){
        return null;
    }
    else{
        return doc.select("p.price").text();
    }
}
}
java web-scraping jsoup jxl
1个回答
0
投票

为了简化代码,我假设价格总是到下一列(+1)。

另外,要处理几列而不是使用单值int URL_COLUMN = 0,我将其替换为要处理的列数组:int[] URL_COLUMNS = { 0, 4, 9 }; // Columns A, E, J

然后,您可以遍历每个列{0, 4, 9}并将数据保存到下一列{1, 5, 10}


    private static final int[] URL_COLUMNS = { 0, 4, 9 }; // Columns A, E, J

    public static void main(final String[] args) throws Exception {

        Workbook originalWorkbook = Workbook.getWorkbook(new File("C:/Users/Shadow/Desktop/original.xls"));
        WritableWorkbook workbook = Workbook.createWorkbook(new File("C:/Users/Shadow/Desktop/updated.xls"), originalWorkbook);
        originalWorkbook.close();
        WritableSheet sheet = workbook.getSheet(0);
        Cell cell;

        // loop over every column
        for (int i = 0; i < URL_COLUMNS.length; i++) {
            int currentRow = 1;
            while (!(cell = sheet.getCell(URL_COLUMNS[i], currentRow)).getType().equals(CellType.EMPTY)) {

                String url = cell.getContents();
                System.out.println("Checking URL: " + url);
                if (url.contains("scrapingsite1.com")) {
                    String Price = ScrapingSite1(url);
                    System.out.println("Scraping Site1's Price: " + Price);
                    // save price into the next column
                    Label cellWithPrice = new Label(URL_COLUMNS[i] + 1, currentRow, Price);
                    sheet.addCell(cellWithPrice);
                }
                currentRow++;
            }
        }

        workbook.write();
        workbook.close();
    }
© www.soinside.com 2019 - 2024. All rights reserved.