Spark-为具有多个列的一个rowKey创建HFile

问题描述 投票:0回答:2
JavaRDD<String> hbaseFile = jsc.textFile(HDFS_MASTER+HBASE_FILE);
JavaPairRDD<ImmutableBytesWritable, KeyValue> putJavaRDD = hbaseFile.mapToPair(line -> convertToKVCol1(line, COLUMN_AGE));
putJavaRDD.sortByKey(true);
putJavaRDD.saveAsNewAPIHadoopFile(stagingFolder, ImmutableBytesWritable.class, KeyValue.class, HFileOutputFormat2.class, conf);

private static Tuple2<ImmutableBytesWritable, KeyValue> convertToKVCol1(String beanString, byte[] column) {
    InspurUserEntity inspurUserEntity = gson.fromJson(beanString, InspurUserEntity.class);
    String rowKey = inspurUserEntity.getDepartment_level1()+"_"+inspurUserEntity.getDepartment_level2()+"_"+inspurUserEntity.getId();
    return new Tuple2<>(new ImmutableBytesWritable(Bytes.toBytes(rowKey)),
            new KeyValue(Bytes.toBytes(rowKey), COLUMN_FAMILY, column, Bytes.toBytes(inspurUserEntity.getAge())));
}

上面是我的代码,它仅适用于行键的单列。有什么想法可以为一个行键创建具有多列的HFile吗?

apache-spark hbase hfile
2个回答
0
投票

您可以为一行创建多个Tuple2<ImmutableBytesWritable, KeyValue>,其中键保持不变,KeyValue代表单个单元格值。确保按字典顺序对列进行排序。因此,您应该在saveAsNewAPIHadoopFile上调用JavaPairRDD<ImmutableBytesWritable, KeyValue>


0
投票

您必须在声明中使用数组而不是ImmutableBytesWritable。

© www.soinside.com 2019 - 2024. All rights reserved.