无法使用Spark的GSC连接器连接Google存储文件

问题描述 投票:10回答:1

我在我的本地机器上写了一个火花作业,它使用google hadoop连接器从谷歌云存储中读取文件,如gs://storage.googleapis.com/,如https://cloud.google.com/dataproc/docs/connectors/cloud-storage中所述

我已经设置了具有计算引擎和存储权限的服务帐户。我的火花配置和代码是

SparkConf conf = new SparkConf();
conf.setAppName("SparkAPp").setMaster("local");
conf.set("google.cloud.auth.service.account.enable", "true");
conf.set("google.cloud.auth.service.account.email", "[email protected]");
conf.set("google.cloud.auth.service.account.keyfile", "/root/Documents/xxx-compute-e71ddbafd13e.p12");
conf.set("fs.gs.project.id", "xxx-990711");
conf.set("fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem");
conf.set("fs.AbstractFileSystem.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem"); 

SparkContext sparkContext = new SparkContext(conf);
JavaRDD<String> data = sparkContext.textFile("gs://storage.googleapis.com/xxx/xxx.txt", 0).toJavaRDD();
data.foreach(line -> System.out.println(line));

我已经设置了名为GOOGLE_APPLICATION_CREDENTIALS的环境变量,该变量指向密钥文件。我尝试过使用两个密钥文件,即json和P12。但无法访问该文件。我得到的错误是

java.net.UnknownHostException: metadata
java.io.IOException: Error getting access token from metadata server at: http://metadata/computeMetadata/v1/instance/service-accounts/default/token
        at com.google.cloud.hadoop.util.CredentialFactory.getCredentialFromMetadataServiceAccount(CredentialFactory.java:208)
        at com.google.cloud.hadoop.util.CredentialConfiguration.getCredential(CredentialConfiguration.java:70)

我正在使用java 8,spark 2.2.0依赖项和gcs-connector 1.6.1.hadoop2从eclipse运行我的工作。我只需要使用服务帐户连接,而不是通过OAuth机制连接。

提前致谢

java apache-spark google-cloud-storage google-cloud-dataproc service-accounts
1个回答
1
投票

你在当地尝试吗?如果是,则需要将环境变量GOOGLE_APPLICATION_CREDENTIALS设置为key.json或将其设置为HadoopConfiguration,而不是将其设置为SparkConf,如:

    Configuration hadoopConfiguration = sparkContext.hadoopConfiguration();
    hadoopConfiguration.set("google.cloud.auth.service.account.enable", true);
    hadoopConfiguration.set("google.cloud.auth.service.account.email", "[email protected]");
    hadoopConfiguration.set("google.cloud.auth.service.account.keyfile", "/root/Documents/xxx-compute-e71ddbafd13e.p12");
© www.soinside.com 2019 - 2024. All rights reserved.