我有一个HashMap<String, HashSet<Long>>
类型的变量,它的大小可以增长到100MB。我需要将其写入辅助存储。
序列化不是一种选择,因为它对我来说太慢了。还有其他更好的方法将字节结构转储到硬盘驱动器吗?
PS:我只担心写入磁盘的速度,慢读不是问题。
您可以自己序列化。您也可以压缩数据以使其更小。
public static void write(String filename, Map<String, Set<Long>> data) throws IOException {
try (DataOutputStream dos = new DataOutputStream(new BufferedOutputStream(
new DeflaterOutputStream(new FileOutputStream(filename))))) {
dos.writeInt(data.size());
for (Map.Entry<String, Set<Long>> entry : data.entrySet()) {
dos.writeUTF(entry.getKey());
Set<Long> value = entry.getValue();
dos.writeInt(value.size());
for (Long l : value) {
dos.writeLong(l);
}
}
}
}
要阅读它,您只是做相同的事情,但阅读而不是写作。
public static Map<String, Set<Long>> read(String filename) throws IOException {
Map<String, Set<Long>> ret = new LinkedHashMap<>();
try (DataInputStream dis = new DataInputStream(new BufferedInputStream(
new InflaterInputStream(new FileInputStream(filename))))) {
for (int i = 0, size = dis.readInt(); i < size; i++) {
String key = dis.readUTF();
Set<Long> values = new LinkedHashSet<>();
ret.put(key, values);
for (int j = 0, size2 = dis.readInt(); j < size2; j++)
values.add(dis.readLong());
}
}
return ret;
}
public static void main(String... ignored) throws IOException {
Map<String, Set<Long>> map = new LinkedHashMap<>();
for (int i = 0; i < 20000; i++) {
Set<Long> set = new LinkedHashSet<>();
set.add(System.currentTimeMillis());
map.put("key-" + i, set);
}
for (int i = 0; i < 5; i++) {
long start = System.nanoTime();
File file = File.createTempFile("delete", "me");
write(file.getAbsolutePath(), map);
Map<String, Set<Long>> map2 = read(file.getAbsolutePath());
if (!map2.equals(map)) {
throw new AssertionError();
}
long time = System.nanoTime() - start;
System.out.printf("With %,d keys, the file used %.1f KB, took %.1f to write/read ms%n", map.size(), file.length() / 1024.0, time / 1e6);
file.delete();
}
}
打印
With 20,000 keys, the file used 44.1 KB, took 155.2 to write/read ms
With 20,000 keys, the file used 44.1 KB, took 84.9 to write/read ms
With 20,000 keys, the file used 44.1 KB, took 51.6 to write/read ms
With 20,000 keys, the file used 44.1 KB, took 21.4 to write/read ms
With 20,000 keys, the file used 44.1 KB, took 21.6 to write/read ms
所以21毫秒内有20K条目,每个条目仅使用2.2字节。
[使用任何合适的序列化库(其中一些是快速的-例如google协议缓冲区是快速的并生成小消息)来以合适的形式获取数据,然后将其压缩到内存中并将结果转储到磁盘。
在大多数情况下,磁盘IO时间将成为您的主要瓶颈,因此减少压缩将很有帮助。
我们可以使用Jackson API做到这一点。
先决条件:将以下Jars添加到您的类路径中。您可以从here.]下载这些
这里,我将为数据结构HashMap做一个例子>
步骤1:创建将数据结构作为变量保存的示例类(DataStructure)。>
public class DataStructure { public HashMap<String, HashSet<Long>> data = new HashMap<String, HashSet<Long>>(); public DataStructure() { } public DataStructure(HashMap<String, HashSet<Long>> data) { this.data = data; } }
步骤2:创建将数据结构存储到文件的方法。
static void storeToFile(HashMap<String, HashSet<Long>> data) { try { String fileName = "test.txt"; FileWriter fw = new FileWriter(fileName); DataStructure ds = new DataStructure(data); ObjectMapper objectMapper = new ObjectMapper(); fw.write(objectMapper.writeValueAsString(ds)); fw.close(); } catch (IOException e) { System.out.println("storeToFile: " + e.getMessage()); } }
步骤2之后,您的数据结构将作为字符串存储在指定的文件中。
我也写了有关检索的博客文章:https://tech-scribbler.blogspot.com/2020/04/how-can-you-store-any-complex-data.html