Apache Spark错误使用hadoop将数据卸载到AWS S3

问题描述 投票:1回答:1

我正在使用Apache Spark v2.3.1并尝试在处理之后将数据卸载到AWS S3。像这样的东西:

data.write().parquet("s3a://" + bucketName + "/" + location);

配置似乎很好:

        String region = System.getenv("AWS_REGION");
        String accessKeyId = System.getenv("AWS_ACCESS_KEY_ID");
        String secretAccessKey = System.getenv("AWS_SECRET_ACCESS_KEY");

        spark.sparkContext().hadoopConfiguration().set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem");
        spark.sparkContext().hadoopConfiguration().set("fs.s3a.awsRegion", region);
        spark.sparkContext().hadoopConfiguration().set("fs.s3a.awsAccessKeyId", accessKeyId);
        spark.sparkContext().hadoopConfiguration().set("fs.s3a.awsSecretAccessKey", secretAccessKey);

%HADOOP_HOME%导致与Spark使用完全相同的版本(v2.6.5)并添加到Path中:

C:\>hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
  fs                   run a generic filesystem user client
  version              print the version
  jar <jar>            run a jar file
  checknative [-a|-h]  check native hadoop and compression libraries availability
  distcp <srcurl> <desturl> copy file or directories recursively
  archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
  classpath            prints the class path needed to get the
                       Hadoop jar and the required libraries
  credential           interact with credential providers
  key                  manage keys via the KeyProvider
  daemonlog            get/set the log level for each daemon
 or
  CLASSNAME            run the class named CLASSNAME 

Maven也是如此:

    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-aws</artifactId>
        <version>2.6.5</version>
    </dependency>

但我仍然在写入时收到以下错误。有什么想法吗?

Caused by: java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
    at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method) ~[hadoop-common-2.6.5.jar:?]
    at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:557) ~[hadoop-common-2.6.5.jar:?]
    at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:977) ~[hadoop-common-2.6.5.jar:?]
java windows amazon-web-services apache-spark hadoop
1个回答
0
投票

是的,我错过了一步。把这个:https://github.com/steveloughran/winutils/tree/master/hadoop-2.6.4/bin带到%HADOOP_HOME%\bin。即使版本不匹配(v2.6.5 vs v2.6.4),这似乎仍然有效。

© www.soinside.com 2019 - 2024. All rights reserved.