如何使用SPARK连接到IBM COS(云对象商店),如何解决 "No FileSystem for scheme: cos"。

问题描述 投票:0回答:1

我正试图使用Spark创建与IBM COS(云对象商店)的连接。Spark版本=2.4.4,Scala版本=2.11.12。

我以正确的凭证在本地运行,但我观察到以下错误--"No FileSystem for scheme: cos"。

我正在分享代码片段以及错误日志。谁能帮我解决这个问题。

先谢谢你!

代码片段。

import com.ibm.ibmos2spark.CloudObjectStorage
import org.apache.spark.sql.SparkSession

object CosConnection extends App{
  var credentials = scala.collection.mutable.HashMap[String, String](
      "endPoint"->"ENDPOINT",
      "accessKey"->"ACCESSKEY",
      "secretKey"->"SECRETKEY"
  )
  var bucketName = "FOO"
  var objectname = "xyz.csv"

  var configurationName = "softlayer_cos" 

  val spark = SparkSession
    .builder()
    .appName("Connect IBM COS")
    .master("local")
    .getOrCreate()


  spark.sparkContext.hadoopConfiguration.set("fs.stocator.scheme.list", "cos")
  spark.sparkContext.hadoopConfiguration.set("fs.stocator.cos.impl", "com.ibm.stocator.fs.cos.COSAPIClient")
  spark.sparkContext.hadoopConfiguration.set("fs.stocator.cos.scheme", "cos")

  var cos = new CloudObjectStorage(spark.sparkContext, credentials, configurationName=configurationName)

  var dfData1 = spark.
    read.format("org.apache.spark.sql.execution.datasources.csv.CSVFileFormat").
    option("header", "true").
    option("inferSchema", "true").
    load(cos.url(bucketName, objectname))

  dfData1.printSchema()
  dfData1.show(5,0)
}

ERROR:

Exception in thread "main" java.io.IOException: No FileSystem for scheme: cos
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
apache-spark apache-spark-sql ibm-cloud persistent-object-store cloud-object-storage
1个回答
0
投票

这个问题是通过映射以下stocator依赖关系解决的,SPARK版本=2.4.4,SCALA版本=2.11.12。

// https://mvnrepository.com/artifact/com.ibm.stocator/stocator
libraryDependencies += "com.ibm.stocator" % "stocator" % "1.0.24"

确保你有 stocator-1.0.24-jar-with-dependencies.jar 当你构建包时,外部库中的

同时确保你把你的端点作为 s3.us.cloud-object-storage.appdomain.cloud而是 https://s3.us.cloud-object-storage.appdomain.cloud

你可以手动构建stocator jar,并包含以下内容 target/stocator-1.0.24-SNAPSHOT-IBM-SDK.jar jar到ClassPath中(如果需要的话)。

git clone https://github.com/SparkTC/stocator
cd stocator
git fetch
git checkout -b 1.0.24-ibm-sdk origin/1.0.24-ibm-sdk
mvn clean install –DskipTests
© www.soinside.com 2019 - 2024. All rights reserved.