我正试图使用Spark创建与IBM COS(云对象商店)的连接。Spark版本=2.4.4,Scala版本=2.11.12。
我以正确的凭证在本地运行,但我观察到以下错误--"No FileSystem for scheme: cos"。
我正在分享代码片段以及错误日志。谁能帮我解决这个问题。
先谢谢你!
代码片段。
import com.ibm.ibmos2spark.CloudObjectStorage
import org.apache.spark.sql.SparkSession
object CosConnection extends App{
var credentials = scala.collection.mutable.HashMap[String, String](
"endPoint"->"ENDPOINT",
"accessKey"->"ACCESSKEY",
"secretKey"->"SECRETKEY"
)
var bucketName = "FOO"
var objectname = "xyz.csv"
var configurationName = "softlayer_cos"
val spark = SparkSession
.builder()
.appName("Connect IBM COS")
.master("local")
.getOrCreate()
spark.sparkContext.hadoopConfiguration.set("fs.stocator.scheme.list", "cos")
spark.sparkContext.hadoopConfiguration.set("fs.stocator.cos.impl", "com.ibm.stocator.fs.cos.COSAPIClient")
spark.sparkContext.hadoopConfiguration.set("fs.stocator.cos.scheme", "cos")
var cos = new CloudObjectStorage(spark.sparkContext, credentials, configurationName=configurationName)
var dfData1 = spark.
read.format("org.apache.spark.sql.execution.datasources.csv.CSVFileFormat").
option("header", "true").
option("inferSchema", "true").
load(cos.url(bucketName, objectname))
dfData1.printSchema()
dfData1.show(5,0)
}
ERROR:
Exception in thread "main" java.io.IOException: No FileSystem for scheme: cos
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
这个问题是通过映射以下stocator依赖关系解决的,SPARK版本=2.4.4,SCALA版本=2.11.12。
// https://mvnrepository.com/artifact/com.ibm.stocator/stocator
libraryDependencies += "com.ibm.stocator" % "stocator" % "1.0.24"
确保你有 stocator-1.0.24-jar-with-dependencies.jar
当你构建包时,外部库中的
同时确保你把你的端点作为 s3.us.cloud-object-storage.appdomain.cloud
而是 https://s3.us.cloud-object-storage.appdomain.cloud
你可以手动构建stocator jar,并包含以下内容 target/stocator-1.0.24-SNAPSHOT-IBM-SDK.jar
jar到ClassPath中(如果需要的话)。
git clone https://github.com/SparkTC/stocator
cd stocator
git fetch
git checkout -b 1.0.24-ibm-sdk origin/1.0.24-ibm-sdk
mvn clean install –DskipTests