IllegalArgumentException,从Spark(Scala)将ML模型写入s3时错误的FS

问题描述 投票:1回答:2

我创建了一个模型:

val model = pipeline.fit(commentLower)

而我正在尝试将其写入s3:

sc.hadoopConfiguration.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
sc.hadoopConfiguration.set("fs.s3.awsAccessKeyId", "MYACCESSKEY")
sc.hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "MYSECRETKEY")
model.write.overwrite().save("s3n://sparkstore/model")

但我收到这个错误:

Name: java.lang.IllegalArgumentException
Message: Wrong FS: s3n://sparkstore/model, expected: file:///
StackTrace: org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:80)
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:529)
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409)
org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1400)
org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:80)

我也尝试使用内联访问键:

model.write.overwrite().save("s3n://MYACCESSKEY:MYSECRETKEY@/sparkstore/model")

如何从Spark编写模型(或任何文件)到s3?

scala apache-spark amazon-s3 ibm-cloud apache-spark-ml
2个回答
2
投票

我没有S3连接来测试。但这就是我的想法,你应该使用: -

val hconf=sc.hadoopConfiguration
hconf.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
hconf.set("fs.s3.awsAccessKeyId", "MYACCESSKEY")
hconf.set("fs.s3.awsSecretAccessKey", "MYSECRETKEY")

当我做df.write.save("s3://sparkstore/model")我得到Name: org.apache.hadoop.fs.s3.S3Exception Message: org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/model' - ResponseCode=403, ResponseMessage=Forbidden StackTrace: org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleServiceException(Jets3tNativeFileSystemStore.java:229) org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:111) s

这让我相信它确实识别了s3 fs的s3协议。但它的认证失败是显而易见的。

希望它可以解决您的问题。

谢谢,查尔斯。


0
投票

这不是我想要做的,但我发现了一个类似的问题:

How to save models from ML Pipeline to S3 or HDFS?

这就是我最终做的事情:

sc.parallelize(Seq(model), 1).saveAsObjectFile("swift://RossL.keystone/model")
val modelx = sc.objectFile[PipelineModel]("swift://RossL.keystone/model").first()
© www.soinside.com 2019 - 2024. All rights reserved.