我了解了Huffman Coding
,并试图申请。因此,我做了一个非常基本的文本阅读器,它只能打开和保存文件。并编写了一个装饰器,该装饰器可用于在保存之前压缩文本(使用Huffman Coding
)。
存在一个我找不到的错误,经过大量调试后,我发现压缩文本时,结果字符�
可能在压缩文本中。例如,文本',-.:BCINSabcdefghiklmnoprstuvwy
被压缩为앐낧淧翵�ဌ䤺큕㈀
。
我发现错误在于保存功能。当我保存压缩文本时,它将�
的每次出现更改为?
。例如,保存앐낧淧翵�ဌ䤺큕㈀
时得到앐낧淧翵?ဌ䤺큕㈀
。
[当我尝试读取保存的文件进行解压缩时,我得到了另一个字符串,因此解压缩失败。
更加困难的是,仅保存功能可以正常工作,但是在我的代码中使用它时却不起作用。函数看起来像这样:
public void save() throws IOException {
FileWriter fileWriter = new FileWriter(this.filename);
fileWriter.write(this.text);
fileWriter.close();
}
[保存时的this.text
为앐낧淧翵�ဌ䤺큕㈀
,但将其保存为앐낧淧翵?ဌ䤺큕㈀
令人困惑。
如我之前所说,该函数在单独运行时可以正常工作,但在我的代码中不起作用。除了从代码中删除尽可能多的内容并将其放在此处,我无能为力。无论如何,可以在函数FileEditor::save
处放置一个断点,并且在保存时您会发现this.text
为앐낧淧翵�ဌ䤺큕㈀
,文件的内容为앐낧淧翵?ဌ䤺큕㈀
。
代码:
[FileEditor
就在Main
的正下方。
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.util.PriorityQueue;
import java.util.TreeMap;
import static pack.BitsManipulator.CHAR_SIZE_IN_BITS;
public class Main {
public static void main(String[] args) throws IOException {
String text = " ',-.:BCINSabcdefghiklmnoprstuvwy";
FileEditor fileEditor2 = new FileEditor("file.txt");
HuffmanDecorator compressor = new HuffmanDecorator(fileEditor2);
compressor.setText(text);
System.out.println(compressor.getText());
compressor.save();
}
}
class FileEditor implements BasicFileEditor {
private String filename;
private String text;
public FileEditor(String filename) throws IOException {
this.filename = filename;
File file = new File(filename);
StringBuilder builder = new StringBuilder();
if (!file.createNewFile()) {
FileReader reader = new FileReader(file);
int ch;
while ((ch = reader.read()) != -1)
builder.append((char) ch);
}
this.text = builder.toString();
}
@Override
public String getText() {
return text;
}
@Override
public void setText(String text) {
this.text = text;
}
@Override
public void save() throws IOException {
FileWriter fileWriter = new FileWriter(this.filename);
fileWriter.write(this.text);
fileWriter.close();
}
}
interface BasicFileEditor {
String getText();
void setText(String text);
void save() throws IOException;
}
abstract class FileEditorDecorator implements BasicFileEditor {
FileEditor fileEditor;
public FileEditorDecorator(FileEditor fileEditor) {
this.fileEditor = fileEditor;
}
@Override
public String getText() {
return fileEditor.getText();
}
@Override
public void setText(String text) {
fileEditor.setText(text);
}
@Override
public void save() throws IOException {
String oldText = getText();
setText(getModifiedText());
fileEditor.save();
setText(oldText);
}
protected abstract String getModifiedText();
}
class HuffmanDecorator extends FileEditorDecorator {
public HuffmanDecorator(FileEditor fileEditor) {
super(fileEditor);
}
@Override
protected String getModifiedText() {
HuffmanCodingCompressor compressor = new HuffmanCodingCompressor(getText());
return compressor.getCompressedText();
}
}
class HuffmanCodingCompressor {
String text;
public HuffmanCodingCompressor(String text) {
this.text = text;
}
public String getCompressedText() {
EncodingBuilder builder = new EncodingBuilder(text);
return builder.getCompressedText();
}
}
class Node implements Comparable<Node> {
public Node left;
public Node right;
public int value;
public Character character;
public Node(Node left, Node right, int value) {
this(left, right, value, null);
}
public Node(Node left, Node right, int value, Character character) {
this.left = left;
this.right = right;
this.character = character;
this.value = value;
}
@Override
public int compareTo(Node o) {
return this.value - o.value;
}
public boolean isLeafNode() {
return left == null && right == null;
}
Node getLeft() {
if (left == null)
left = new Node(null, null, 0);
return left;
}
Node getRight() {
if (right == null)
right = new Node(null, null, 0);
return right;
}
}
class EncodingBuilder {
private String text;
private Node encodingTree;
private TreeMap<Character, String> encodingTable;
public EncodingBuilder(String text) {
this.text = text;
buildEncodingTree();
buildEncodingTableFromTree(encodingTree);
}
private void buildEncodingTableFromTree(Node encodingTree) {
encodingTable = new TreeMap<>();
buildEncodingTableFromTreeHelper(encodingTree, new StringBuilder());
}
public void buildEncodingTableFromTreeHelper(Node root, StringBuilder key) {
if (root == null)
return;
if (root.isLeafNode()) {
encodingTable.put(root.character, key.toString());
} else {
key.append('0');
buildEncodingTableFromTreeHelper(root.left, key);
key.deleteCharAt(key.length() - 1);
key.append('1');
buildEncodingTableFromTreeHelper(root.right, key);
key.deleteCharAt(key.length() - 1);
}
}
public void buildEncodingTree() {
TreeMap<Character, Integer> freqArray = new TreeMap<>();
for (int i = 0; i < text.length(); i++) {
// improve here.
char c = text.charAt(i);
if (freqArray.containsKey(c)) {
Integer freq = freqArray.get(c) + 1;
freqArray.put(c, freq);
} else {
freqArray.put(c, 1);
}
}
PriorityQueue<Node> queue = new PriorityQueue<>();
for (Character c : freqArray.keySet())
queue.add(new Node(null, null, freqArray.get(c), c));
if (queue.size() == 1)
queue.add(new Node(null, null, 0, '\0'));
while (queue.size() > 1) {
Node n1 = queue.poll();
Node n2 = queue.poll();
queue.add(new Node(n1, n2, n1.value + n2.value));
}
encodingTree = queue.poll();
}
public String getCompressedTextInBits() {
StringBuilder bits = new StringBuilder();
for (int i = 0; i < text.length(); i++)
bits.append(encodingTable.get(text.charAt(i)));
return bits.toString();
}
public String getCompressedText() {
String compressedInBits = getCompressedTextInBits();
int remainder = compressedInBits.length() % CHAR_SIZE_IN_BITS;
int paddingNeededToBeDivisibleByCharSize = CHAR_SIZE_IN_BITS - remainder;
String compressed = BitsManipulator.convertBitsToText(compressedInBits + "0".repeat(paddingNeededToBeDivisibleByCharSize));
return compressed;
}
}
class BitsManipulator {
public static final int CHAR_SIZE_IN_BITS = 16;
public static int bitsInStringToInt(String bits) {
int result = 0;
for (int i = 0; i < bits.length(); i++) {
result *= 2;
result += bits.charAt(i) - '0';
}
return result;
}
public static String convertBitsToText(String bits) {
if (bits.length() % CHAR_SIZE_IN_BITS != 0)
throw new NumberOfBitsNotDivisibleBySizeOfCharException();
StringBuilder result = new StringBuilder();
for (int i = 0; i < bits.length(); i += CHAR_SIZE_IN_BITS)
result.append(asciiInBitsToChar(bits.substring(i, i + CHAR_SIZE_IN_BITS)));
return result.toString();
}
public static char asciiInBitsToChar(String bits) {
return (char) bitsInStringToInt(bits);
}
public static class NumberOfBitsNotDivisibleBySizeOfCharException extends RuntimeException {
}
}
。是Unicode替换字符U + FFFD。如果您以非unicode编码对其进行编码,则它将转换为常规问号,因为非unicode编码无法对所有unicode字符进行编码,从而提供了“安全性”(即,将所有内容都转换为我们无法编码)。
您似乎对二进制数据和文本数据之间的差异感到困惑,导致您将压缩数据视为韩文而不是二进制数据。您需要将数据存储(并观察)为bytes,而不是chars或Strings。