我正在写一个字符串清洗器,然后使用以下规则将数据写入日志文件:
A-Za-z0-9
以及<>[],.:_-
和空格)"," => "<comma>"
,"%" => "<percent>"
)] >>"φ" => "<U+03C6>"
,"π" => "<U+03C0>"
)到目前为止,第1和第2个正在工作,但没有第3个。这是我到目前为止所拥有的:
public static string Safe(string s) { s = s .Replace("<", "ooopen-angle-brackettt") // must come first .Replace(">", "ccclose-angle-brackettt") // must come first //.Replace(",", "<comma>") // allow //.Replace(".", "<dot>") // allow //.Replace(":", "<colon>") // allow .Replace(";", "<semi-colon>") .Replace("{", "<open-curly-bracket>") .Replace("}", "<close-curly-bracket>") //.Replace("[", "<open-square-bracket>") // allow //.Replace("]", "<close-square-bracket>") // allow .Replace("(", "<open-bracket>") .Replace(")", "<close-bracket>") .Replace("!", "<exclamation-mark>") .Replace("@", "<at>") .Replace("#", "<hash>") .Replace("$", "<dollar>") .Replace("%", "<percent>") .Replace("^", "<hat>") .Replace("&", "<and>") .Replace("*", "<asterisk>") //.Replace("-", "<dash>") // allow //.Replace("_", "<underscore>") // allow .Replace("+", "<plus>") .Replace("=", "<equals>") .Replace("\\", "<forward-slash>") .Replace("\"", "<double-quote>") .Replace("'", "<single-quote>") .Replace("/", "<forward-slash>") .Replace("?", "<question-mark>") .Replace("|", "<pipe>") .Replace("~", "<tilde>") .Replace("`", "<backtick>") .Replace("ooopen-angle-brackettt", "<open-angle-bracket>") .Replace("ccclose-angle-brackettt", "<close-angle-bracket>"); // all working upto here. broken below: Regex itemRegex = new Regex(@"[^A-Za-z0-9<>[\]:.,_\s-]", RegexOptions.Compiled); foreach (Match itemMatch in itemRegex.Matches(s)) { // the reason for [0] and [1] is that I read that unicode consists of 2 characters s = s.Replace( itemMatch.ToString(), "<U+" + (((int)(itemMatch.ToString()).ToCharArray()[0]).ToString("X4")).ToString() + (((int)(itemMatch.ToString()).ToCharArray()[1]).ToString("X4")).ToString() + ">" ); } return s; }
正则表达式部分未在输入字符串中捕获unicode字符。我该如何解决
我正在写一个字符串清洗器,然后使用以下规则将数据写入日志文件:指定的字符被列入白名单(A-Za-z0-9以及<> [],。:_-和空白) ...
问题是,我假设当将字符串转换为string
数组(char
)时,C#char[]
中存在的单个unicode值将转换为多个项目。如果您将鼠标悬停在Visual Studio中的string
和char
类型上,那么它实际上会告诉您这些类型与unicode有何关系: