清理字符串以进行日志记录

问题描述 投票:1回答:1

我正在写一个字符串清洗器,然后使用以下规则将数据写入日志文件:

  1. 指定的字符被列入白名单(A-Za-z0-9以及<>[],.:_-和空格)
  2. 将指定字符转换为英文名称的英文括弧(例如"," => "<comma>""%" => "<percent>")] >>
  3. 其他任何内容都将转换为其在方括号内的unicode号(例如"φ" => "<U+03C6>""π" => "<U+03C0>"
  4. 到目前为止,第1和第2个正在工作,但没有第3个。这是我到目前为止所拥有的:

    public static string Safe(string s)
    {
        s = s
            .Replace("<", "ooopen-angle-brackettt") // must come first
            .Replace(">", "ccclose-angle-brackettt") // must come first
            //.Replace(",", "<comma>") // allow
            //.Replace(".", "<dot>") // allow
            //.Replace(":", "<colon>") // allow
            .Replace(";", "<semi-colon>")
            .Replace("{", "<open-curly-bracket>")
            .Replace("}", "<close-curly-bracket>")
            //.Replace("[", "<open-square-bracket>") // allow
            //.Replace("]", "<close-square-bracket>") // allow
            .Replace("(", "<open-bracket>")
            .Replace(")", "<close-bracket>")
            .Replace("!", "<exclamation-mark>")
            .Replace("@", "<at>")
            .Replace("#", "<hash>")
            .Replace("$", "<dollar>")
            .Replace("%", "<percent>")
            .Replace("^", "<hat>")
            .Replace("&", "<and>")
            .Replace("*", "<asterisk>")
            //.Replace("-", "<dash>") // allow
            //.Replace("_", "<underscore>") // allow
            .Replace("+", "<plus>")
            .Replace("=", "<equals>")
            .Replace("\\", "<forward-slash>")
            .Replace("\"", "<double-quote>")
            .Replace("'", "<single-quote>")
            .Replace("/", "<forward-slash>")
            .Replace("?", "<question-mark>")
            .Replace("|", "<pipe>")
            .Replace("~", "<tilde>")
            .Replace("`", "<backtick>")
            .Replace("ooopen-angle-brackettt", "<open-angle-bracket>")
            .Replace("ccclose-angle-brackettt", "<close-angle-bracket>");
        // all working upto here. broken below:

        Regex itemRegex = new Regex(@"[^A-Za-z0-9<>[\]:.,_\s-]", RegexOptions.Compiled);
        foreach (Match itemMatch in itemRegex.Matches(s))
        {
            // the reason for [0] and [1] is that I read that unicode consists of 2 characters
            s = s.Replace(
                itemMatch.ToString(),
                "<U+" +
                    (((int)(itemMatch.ToString()).ToCharArray()[0]).ToString("X4")).ToString() +
                    (((int)(itemMatch.ToString()).ToCharArray()[1]).ToString("X4")).ToString() +
                ">"
            );
        }
        return s;
    }

正则表达式部分未在输入字符串中捕获unicode字符。我该如何解决

我正在写一个字符串清洗器,然后使用以下规则将数据写入日志文件:指定的字符被列入白名单(A-Za-z0-9以及<> [],。:_-和空白) ...

c# regex
1个回答
0
投票

问题是,我假设当将字符串转换为string数组(char)时,C#char[]中存在的单个unicode值将转换为多个项目。如果您将鼠标悬停在Visual Studio中的stringchar类型上,那么它实际上会告诉您这些类型与unicode有何关系:

© www.soinside.com 2019 - 2024. All rights reserved.