如何在Java中使用Spark sql读取多个Excel工作表

问题描述 投票:0回答:1

我正在使用Maven项目在包含2张纸的Spark sql中读取Excel文件。

[Sheet1Sheet2

当我尝试下面的代码时,它运行时没有错误,但是工作表选项不起作用。无论选择Sheet1,它总是读取sheetName。谁能指出我所缺少的东西?

Dependencies

<dependencies>
    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.4.2</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.11</artifactId>
        <version>2.4.2</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/com.crealytics/spark-excel -->
    <dependency>
        <groupId>com.crealytics</groupId>
        <artifactId>spark-excel_2.11</artifactId>
        <version>0.11.1</version>
    </dependency>
</dependencies>

代码

import org.apache.log4j.Level;
import org.apache.log4j.Logger;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;

public class ReadExcelSheets {

  public static void main(String[] args) {

    Logger.getLogger("org").setLevel(Level.ERROR);

    SparkSession spark = SparkSession
            .builder()
            .appName("Java Spark SQL Example")
            .config("spark.master", "local")
            .getOrCreate();


    Dataset<Row> df = spark.read()
            .format("com.crealytics.spark.excel")
            .option("useHeader", "true")
            .option("sheetName", "Sheet2")
            .load("datasets/test1.xlsx");

    df.show();
  }
}
java excel apache-spark apache-spark-sql
1个回答
0
投票

您是否找到了解决方案?

© www.soinside.com 2019 - 2024. All rights reserved.