我尝试通过以下代码将数据加载到s3:
String schema = <some_json_data>;
Parser parser = new Schema.Parser().setValidate(true);
Schema avroSchema = parser.parse(schema);
Configuration conf = new Configuration();
conf.set("fs.s3a.access.key", "<access key id>");
conf.set("fs.s3a.secret.key", "<acsess key>");
conf.set("fs.s3a.endpoint", "s3.amazonaws.com");
conf.set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider");
conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName()); // Not needed unless you reference the hadoop-hdfs library.
conf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName()); // Uncomment if you get "No FileSystem for scheme: file" errors
Path path = new Path("s3a://<backet>/<directory>/my.parquet");
try (ParquetWriter<GenericData.Record> writer = AvroParquetWriter.<GenericData.Record>builder(path)
.withSchema(avroSchema)
.withConf(new Configuration())
.withCompressionCodec(SNAPPY)
.withWriteMode(OVERWRITE)
.build()) {
GenericData.Record record = new GenericData.Record(avroSchema);
record.put("myInteger", 1);
record.put("myString", "string value 1");
GenericData.Record record1 = new GenericData.Record(avroSchema);
record1.put("myInteger", 2);
record1.put("myString", "string value 2");
writer.write(record);
writer.write(record1);
writer.write(record);
} catch (Exception ex)
{
ex.printStackTrace(System.out);
}
我使用以下依赖项:org.apache.avro 1.8.2,org.apache.hadoop-hadoop-core 1.2.1,org.apache.parquet-parquet-hadoop 1.8.1,org.apache.parquet-parquet- avro 1.8.1,org.apache.hadoop-hadoop-azure-datalake 3.2.1。而且我得到以下错误:java.io.IOException:方案:s3a没有文件系统。
是否可以将镶木地板数据写入s3而不会产生火花?
您缺少此配置: