为什么将char'。''保存到文件中将其另存为'?'?

问题描述 投票:0回答:1

我了解了Huffman Coding,并试图申请。因此,我做了一个非常基本的文本阅读器,它只能打开和保存文件。并编写了一个装饰器,该装饰器可用于在保存之前压缩文本(使用Huffman Coding)。

存在一个我找不到的错误,经过大量调试后,我发现压缩文本时,结果字符可能在压缩文本中。例如,文本',-.:BCINSabcdefghiklmnoprstuvwy被压缩为앐낧淧翵�ဌ䤺큕㈀

我发现错误在于保存功能。当我保存压缩文本时,它将的每次出现更改为?。例如,保存앐낧淧翵�ဌ䤺큕㈀时得到앐낧淧翵?ဌ䤺큕㈀

[当我尝试读取保存的文件进行解压缩时,我得到了另一个字符串,因此解压缩失败。

更加困难的是,仅保存功能可以正常工作,但是在我的代码中使用它时却不起作用。函数看起来像这样:

public void save() throws IOException {
    FileWriter fileWriter = new FileWriter(this.filename);
    fileWriter.write(this.text);
    fileWriter.close();
}

[保存时的this.text앐낧淧翵�ဌ䤺큕㈀,但将其保存为앐낧淧翵?ဌ䤺큕㈀令人困惑。

如我之前所说,该函数在单独运行时可以正常工作,但在我的代码中不起作用。除了从代码中删除尽可能多的内容并将其放在此处,我无能为力。无论如何,可以在函数FileEditor::save处放置一个断点,并且在保存时您会发现this.text앐낧淧翵�ဌ䤺큕㈀,文件的内容为앐낧淧翵?ဌ䤺큕㈀

代码:

[FileEditor就在Main的正下方。

import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.util.PriorityQueue;
import java.util.TreeMap;

import static pack.BitsManipulator.CHAR_SIZE_IN_BITS;

public class Main {

    public static void main(String[] args) throws IOException {

        String text = " ',-.:BCINSabcdefghiklmnoprstuvwy";

        FileEditor fileEditor2 = new FileEditor("file.txt");
        HuffmanDecorator compressor = new HuffmanDecorator(fileEditor2);
        compressor.setText(text);
        System.out.println(compressor.getText());
        compressor.save();

    }
}

class FileEditor implements BasicFileEditor {
    private String filename;
    private String text;

    public FileEditor(String filename) throws IOException {

        this.filename = filename;
        File file = new File(filename);
        StringBuilder builder = new StringBuilder();

        if (!file.createNewFile()) {
            FileReader reader = new FileReader(file);

            int ch;
            while ((ch = reader.read()) != -1)
                builder.append((char) ch);
        }

        this.text = builder.toString();
    }

    @Override
    public String getText() {
        return text;
    }

    @Override
    public void setText(String text) {
        this.text = text;
    }

    @Override
    public void save() throws IOException {
        FileWriter fileWriter = new FileWriter(this.filename);
        fileWriter.write(this.text);
        fileWriter.close();
    }
}

interface BasicFileEditor {

    String getText();

    void setText(String text);
    void save() throws IOException;

}

abstract class FileEditorDecorator implements BasicFileEditor {
    FileEditor fileEditor;

    public FileEditorDecorator(FileEditor fileEditor) {
        this.fileEditor = fileEditor;
    }

    @Override
    public String getText() {
        return fileEditor.getText();
    }

    @Override
    public void setText(String text) {
        fileEditor.setText(text);
    }

    @Override
    public void save() throws IOException {
        String oldText = getText();
        setText(getModifiedText());
        fileEditor.save();
        setText(oldText);
    }

    protected abstract String getModifiedText();
}

class HuffmanDecorator extends FileEditorDecorator {

    public HuffmanDecorator(FileEditor fileEditor) {
        super(fileEditor);
    }

    @Override
    protected String getModifiedText() {
        HuffmanCodingCompressor compressor = new HuffmanCodingCompressor(getText());
        return compressor.getCompressedText();
    }
}

class HuffmanCodingCompressor {
    String text;

    public HuffmanCodingCompressor(String text) {
        this.text = text;
    }

    public String getCompressedText() {
        EncodingBuilder builder = new EncodingBuilder(text);

        return builder.getCompressedText();
    }
}

class Node implements Comparable<Node> {
    public Node left;
    public Node right;
    public int value;
    public Character character;

    public Node(Node left, Node right, int value) {
        this(left, right, value, null);
    }

    public Node(Node left, Node right, int value, Character character) {
        this.left = left;
        this.right = right;
        this.character = character;
        this.value = value;
    }

    @Override
    public int compareTo(Node o) {
        return this.value - o.value;
    }

    public boolean isLeafNode() {
        return left == null && right == null;
    }

    Node getLeft() {
        if (left == null)
            left = new Node(null, null, 0);
        return left;
    }

    Node getRight() {
        if (right == null)
            right = new Node(null, null, 0);
        return right;
    }
}

class EncodingBuilder {

    private String text;
    private Node encodingTree;
    private TreeMap<Character, String> encodingTable;

    public EncodingBuilder(String text) {
        this.text = text;
        buildEncodingTree();
        buildEncodingTableFromTree(encodingTree);
    }

    private void buildEncodingTableFromTree(Node encodingTree) {
        encodingTable = new TreeMap<>();
        buildEncodingTableFromTreeHelper(encodingTree, new StringBuilder());
    }


    public void buildEncodingTableFromTreeHelper(Node root, StringBuilder key) {
        if (root == null)
            return;

        if (root.isLeafNode()) {
            encodingTable.put(root.character, key.toString());
        } else {
            key.append('0');
            buildEncodingTableFromTreeHelper(root.left, key);
            key.deleteCharAt(key.length() - 1);

            key.append('1');
            buildEncodingTableFromTreeHelper(root.right, key);
            key.deleteCharAt(key.length() - 1);
        }
    }

    public void buildEncodingTree() {

        TreeMap<Character, Integer> freqArray = new TreeMap<>();

        for (int i = 0; i < text.length(); i++) {
            // improve here.
            char c = text.charAt(i);
            if (freqArray.containsKey(c)) {
                Integer freq = freqArray.get(c) + 1;
                freqArray.put(c, freq);
            } else {
                freqArray.put(c, 1);
            }
        }

        PriorityQueue<Node> queue = new PriorityQueue<>();

        for (Character c : freqArray.keySet())
            queue.add(new Node(null, null, freqArray.get(c), c));

        if (queue.size() == 1)
            queue.add(new Node(null, null, 0, '\0'));

        while (queue.size() > 1) {
            Node n1 = queue.poll();
            Node n2 = queue.poll();

            queue.add(new Node(n1, n2, n1.value + n2.value));
        }

        encodingTree = queue.poll();
    }

    public String getCompressedTextInBits() {
        StringBuilder bits = new StringBuilder();

        for (int i = 0; i < text.length(); i++)
            bits.append(encodingTable.get(text.charAt(i)));

        return bits.toString();
    }

    public String getCompressedText() {
        String compressedInBits = getCompressedTextInBits();
        int remainder = compressedInBits.length() % CHAR_SIZE_IN_BITS;
        int paddingNeededToBeDivisibleByCharSize = CHAR_SIZE_IN_BITS - remainder;
        String compressed = BitsManipulator.convertBitsToText(compressedInBits + "0".repeat(paddingNeededToBeDivisibleByCharSize));
        return compressed;
    }
}

class BitsManipulator {

    public static final int CHAR_SIZE_IN_BITS = 16;

    public static int bitsInStringToInt(String bits) {
        int result = 0;
        for (int i = 0; i < bits.length(); i++) {
            result *= 2;
            result += bits.charAt(i) - '0';
        }
        return result;
    }

    public static String convertBitsToText(String bits) {
        if (bits.length() % CHAR_SIZE_IN_BITS != 0)
            throw new NumberOfBitsNotDivisibleBySizeOfCharException();

        StringBuilder result = new StringBuilder();

        for (int i = 0; i < bits.length(); i += CHAR_SIZE_IN_BITS)
            result.append(asciiInBitsToChar(bits.substring(i, i + CHAR_SIZE_IN_BITS)));

        return result.toString();
    }

    public static char asciiInBitsToChar(String bits) {
        return (char) bitsInStringToInt(bits);
    }

    public static class NumberOfBitsNotDivisibleBySizeOfCharException extends RuntimeException {
    }
}
java save filewriter
1个回答
1
投票

。是Unicode替换字符U + FFFD。如果您以非unicode编码对其进行编码,则它将转换为常规问号,因为非unicode编码无法对所有unicode字符进行编码,从而提供了“安全性”(即,将所有内容都转换为我们无法编码)。

您似乎对二进制数据和文本数据之间的差异感到困惑,导致您将压缩数据视为韩文而不是二进制数据。您需要将数据存储(并观察)为bytes,而不是chars或Strings。

© www.soinside.com 2019 - 2024. All rights reserved.