如何在c#中将字符串从utf8转换(音译)为ASCII(单字节)?

问题描述 投票:0回答:6

我有一个字符串对象

“具有多个字符,甚至特殊字符”

我正在尝试使用

UTF8Encoding utf8 = new UTF8Encoding();
ASCIIEncoding ascii = new ASCIIEncoding();

objects 以便将该字符串转换为 ascii。我可以请某人为这个简单的任务带来一些启发吗,那就是狩猎我的下午。

编辑1: 我们想要完成的是摆脱特殊字符,例如一些特殊的 Windows 撇号。我在下面发布的作为答案的代码不会解决这个问题。基本上

奥布莱恩将成为奥布莱恩。其中 ' 是特殊撇号之一

c# encoding utf-8 ascii transliteration
6个回答
20
投票

这是对你的另一个问题的回应,看起来它已被删除......这一点仍然成立。

看起来像一个经典的 Unicode 到 ASCII 问题。诀窍是找到它发生的哪里

.NET 可以很好地使用 Unicode,假设 被告知它是 Unicode 开始(或保留默认值)。

我的猜测是你的接收应用程序无法处理它。所以,我可能会使用 ASCIIEncoder with EncoderReplacementFallback 与 String.Empty:

using System.Text;

string inputString = GetInput();
var encoder = ASCIIEncoding.GetEncoder();
encoder.Fallback = new EncoderReplacementFallback(string.Empty);

byte[] bAsciiString = encoder.GetBytes(inputString);

// Do something with bytes...
// can write to a file as is
File.WriteAllBytes(FILE_NAME, bAsciiString);
// or turn back into a "clean" string
string cleanString = ASCIIEncoding.GetString(bAsciiString); 
// since the offending bytes have been removed, can use default encoding as well
Assert.AreEqual(cleanString, Default.GetString(bAsciiString));

当然,在过去,我们只是循环并删除任何大于 127 的字符……好吧,至少我们这些在美国的人是这样。 ;)


12
投票

ASCIIEncoding ascii = new ASCIIEncoding(); byte[] byteArray = Encoding.UTF8.GetBytes(sOriginal); byte[] asciiArray = Encoding.Convert(Encoding.UTF8, Encoding.ASCII, byteArray); string finalString = ascii.GetString(asciiArray);

让我知道是否有更简单的方法。


7
投票

using System.Text; namespace System { public static class StringExtension { private static readonly ASCIIEncoding asciiEncoding = new ASCIIEncoding(); public static string ToAscii(this string dirty) { byte[] bytes = asciiEncoding.GetBytes(dirty); string clean = asciiEncoding.GetString(bytes); return clean; } } }

(系统命名空间,因此它几乎可以自动用于我们所有的字符串。)


6
投票

using System.Text; // Create encoder with a replacing encoder fallback var encoder = ASCIIEncoding.GetEncoding("us-ascii", new EncoderReplacementFallback(string.Empty), new DecoderExceptionFallback()); string cleanString = encoder.GetString(encoder.GetBytes(dirtyString));



2
投票

您必须将变量

targetEncoding

更改为您想要的任何编码。 Encoding targetEncoding = Encoding.GetEncoding(874); // Your target encoding Encoding utf8 = Encoding.UTF8; var stringBytes = utf8.GetBytes(Name); var stringTargetBytes = Encoding.Convert(utf8, targetEncoding, stringBytes); var ascii8BitRepresentAsCsString = Encoding.GetEncoding("Latin1").GetString(stringTargetBytes);



0
投票

如果您需要将数据输入另一个不支持 unicode 的系统,这非常有用。通过使用 stringbuilder 和简单循环,代码速度很快(测试处理 8,000 个字符字符串 10,000x = 1.1 秒)。

Address:123 East Tāmaki – Tāmaki“ ” GötheФ€ O’Briens ‘hello’ he said!

输出->

Address:123 East Tamaki - Tamaki" " Gothe O'Briens 'hello' he said!


/// <summary> /// Transliterate all unicode chars to their closest ascii version /// Remove/fix accents, maori macrons, typesetters colons, dashes, curly quotes, apostrophes, dashes, invisible spaces, and other bad chars /// 1. remove accents but keep the letters /// 2. fix punctuation to the closest ascii punctuation /// 3. remove any remaining non ascii chars /// 4. also remove any invisible control chars /// Option: remove line breaks or keep them /// </summary> /// <example>"CHASSIS NO.:LC0CE4CB3N0345426 East Tāmaki – East Tāmaki“ ” GötheФ€ O’Briens ‘hello’ he said!" outputs "CHASSIS NO.:LC0CE4CB3N0345426 East Tamaki - East Tamaki" " Gothe O'Briens 'hello' he said!"</example> public static string CleanUnicodeTransliterateToAscii(string text, bool removeLineBreaks) { if (text == null) return null; // decomposes accented letters into the letter and the diacritic, fixes wacky punctuation to closest common punctuation text = text.Normalize(NormalizationForm.FormKD); // loop all chars after converting all punctuation to the closest (fix curly quotes etc) var stringBuilder = new StringBuilder(); foreach (var c in text) { var unicodeCategory = CharUnicodeInfo.GetUnicodeCategory(c); if (c == '\r' || c == '\n') { if (removeLineBreaks) { // skip } else { stringBuilder.Append(c); } } else if (unicodeCategory == UnicodeCategory.Control) { // control char - skip } else if (unicodeCategory == UnicodeCategory.NonSpacingMark) { // diacritic mark/accent - skip } else if (c == '‘' || c == '’') { // single curly quote or apostrophe add apostrophe stringBuilder.Append("'"); } else if (unicodeCategory == UnicodeCategory.InitialQuotePunctuation || unicodeCategory == UnicodeCategory.FinalQuotePunctuation) { // any other quote add a normal straight quote stringBuilder.Append("\""); } else if (unicodeCategory == UnicodeCategory.DashPunctuation) { stringBuilder.Append("-"); } else if (unicodeCategory == UnicodeCategory.SpaceSeparator) { // add a normal space stringBuilder.Append(" "); } else if (c > 255) { // skip any remaining non ascii chars } else { stringBuilder.Append(c); } } text = stringBuilder.ToString(); return text; }
© www.soinside.com 2019 - 2024. All rights reserved.