清理字符串以进行日志记录

Question

我正在写一个字符串清洗器，然后使用以下规则将数据写入日志文件：

指定的字符被列入白名单（A-Za-z0-9以及<>[],.:_-和空格）
将指定字符转换为英文名称的英文括弧（例如"," => "<comma>"，"%" => "<percent>"）] >>
其他任何内容都将转换为其在方括号内的unicode号（例如"φ" => "<U+03C6>"，"π" => "<U+03C0>"）

到目前为止，第1和第2个正在工作，但没有第3个。这是我到目前为止所拥有的：

    public static string Safe(string s)
    {
        s = s
            .Replace("<", "ooopen-angle-brackettt") // must come first
            .Replace(">", "ccclose-angle-brackettt") // must come first
            //.Replace(",", "<comma>") // allow
            //.Replace(".", "<dot>") // allow
            //.Replace(":", "<colon>") // allow
            .Replace(";", "<semi-colon>")
            .Replace("{", "<open-curly-bracket>")
            .Replace("}", "<close-curly-bracket>")
            //.Replace("[", "<open-square-bracket>") // allow
            //.Replace("]", "<close-square-bracket>") // allow
            .Replace("(", "<open-bracket>")
            .Replace(")", "<close-bracket>")
            .Replace("!", "<exclamation-mark>")
            .Replace("@", "<at>")
            .Replace("#", "<hash>")
            .Replace("$", "<dollar>")
            .Replace("%", "<percent>")
            .Replace("^", "<hat>")
            .Replace("&", "<and>")
            .Replace("*", "<asterisk>")
            //.Replace("-", "<dash>") // allow
            //.Replace("_", "<underscore>") // allow
            .Replace("+", "<plus>")
            .Replace("=", "<equals>")
            .Replace("\\", "<forward-slash>")
            .Replace("\"", "<double-quote>")
            .Replace("'", "<single-quote>")
            .Replace("/", "<forward-slash>")
            .Replace("?", "<question-mark>")
            .Replace("|", "<pipe>")
            .Replace("~", "<tilde>")
            .Replace("`", "<backtick>")
            .Replace("ooopen-angle-brackettt", "<open-angle-bracket>")
            .Replace("ccclose-angle-brackettt", "<close-angle-bracket>");
        // all working upto here. broken below:

        Regex itemRegex = new Regex(@"[^A-Za-z0-9<>[\]:.,_\s-]", RegexOptions.Compiled);
        foreach (Match itemMatch in itemRegex.Matches(s))
        {
            // the reason for [0] and [1] is that I read that unicode consists of 2 characters
            s = s.Replace(
                itemMatch.ToString(),
                "<U+" +
                    (((int)(itemMatch.ToString()).ToCharArray()[0]).ToString("X4")).ToString() +
                    (((int)(itemMatch.ToString()).ToCharArray()[1]).ToString("X4")).ToString() +
                ">"
            );
        }
        return s;
    }
正则表达式部分未在输入字符串中捕获unicode字符。我该如何解决

我正在写一个字符串清洗器，然后使用以下规则将数据写入日志文件：指定的字符被列入白名单（A-Za-z0-9以及<> []，。：_-和空白） ...

Answer 1

问题是，我假设当将字符串转换为string数组（char）时，C＃char[]中存在的单个unicode值将转换为多个项目。如果您将鼠标悬停在Visual Studio中的string和char类型上，那么它实际上会告诉您这些类型与unicode有何关系：

清理字符串以进行日志记录

问题描述投票：1回答：1

1个回答

最新问题

清理字符串以进行日志记录

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1