在Windows中获取文件的编码

Question

这不是一个真正的编程问题，是否有命令行或Windows工具（Windows 7）来获取文本文件的当前编码？当然我可以写一个小C＃应用程序，但我想知道是否有内置的东西？

Answer 1

使用Windows附带的常规旧香草记事本打开您的文件。单击“另存为...”时，它将显示文件的编码。它看起来像这样：

无论默认选择的编码是什么，这都是您当前编码的文件。如果是UTF-8，您可以将其更改为ANSI并单击“保存”以更改编码（反之亦然）。

我意识到有许多不同类型的编码，但当我被告知我们的导出文件是UTF-8并且它们需要ANSI时，这就是我所需要的。这是一次性出口，所以记事本适合我。

仅供参考：根据我的理解，我认为“Unicode”（如记事本中所列）是UTF-16的误称。更多关于记事本的“Unicode”选项：Windows 7 - UTF-8 and Unicdoe

Answer 2

这里有一些C代码用于可靠的ascii，bom和utf8检测：https://unicodebook.readthedocs.io/guess_encoding.html

只有ASCII，UTF-8和使用BOM的编码（带BOM的UTF-7，带BOM的UTF-8，UTF-16和UTF-32）具有可靠的算法来获取文档的编码。对于所有其他编码，您必须信任基于统计数据的启发式方法。

编辑：

C＃的powershell版本答案来自：Effective way to find any file's Encoding。仅适用于签名（boms）。

# encoding.ps1
param([Parameter(ValueFromPipeline=$True)] $filename)      
process {
  $reader = [System.IO.StreamReader]::new($filename, [System.Text.Encoding]::default,$true)
  $peek = $reader.Peek()
  $encoding = $reader.currentencoding
  $reader.close()
  [pscustomobject]@{Name=split-path $filename -leaf
                BodyName=$encoding.BodyName
                EncodingName=$encoding.EncodingName}
}


PS C:\> .\encoding.ps1 chinese8.txt

Name         BodyName EncodingName
----         -------- ------------
chinese8.txt utf-8    Unicode (UTF-8)

Answer 3

通过GnuWin32在Windows上提供（Linux）命令行工具'文件'：

http://gnuwin32.sourceforge.net/packages/file.htm

如果您安装了git，它位于C：\ Program Files \ git \ usr \ bin中。

例：

    C:\Users\SH\Downloads\SquareRoot>file *
    _UpgradeReport_Files;         directory
    Debug;                        directory
    duration.h;                   ASCII C++ program text, with CRLF line terminators
    ipch;                         directory
    main.cpp;                     ASCII C program text, with CRLF line terminators
    Precision.txt;                ASCII text, with CRLF line terminators
    Release;                      directory
    Speed.txt;                    ASCII text, with CRLF line terminators
    SquareRoot.sdf;               data
    SquareRoot.sln;               UTF-8 Unicode (with BOM) text, with CRLF line terminators
    SquareRoot.sln.docstates.suo; PCX ver. 2.5 image data
    SquareRoot.suo;               CDF V2 Document, corrupt: Cannot read summary info
    SquareRoot.vcproj;            XML  document text
    SquareRoot.vcxproj;           XML document text
    SquareRoot.vcxproj.filters;   XML document text
    SquareRoot.vcxproj.user;      XML document text
    squarerootmethods.h;          ASCII C program text, with CRLF line terminators
    UpgradeLog.XML;               XML  document text

    C:\Users\SH\Downloads\SquareRoot>file --mime-encoding *
    _UpgradeReport_Files;         binary
    Debug;                        binary
    duration.h;                   us-ascii
    ipch;                         binary
    main.cpp;                     us-ascii
    Precision.txt;                us-ascii
    Release;                      binary
    Speed.txt;                    us-ascii
    SquareRoot.sdf;               binary
    SquareRoot.sln;               utf-8
    SquareRoot.sln.docstates.suo; binary
    SquareRoot.suo;               CDF V2 Document, corrupt: Cannot read summary infobinary
    SquareRoot.vcproj;            us-ascii
    SquareRoot.vcxproj;           utf-8
    SquareRoot.vcxproj.filters;   utf-8
    SquareRoot.vcxproj.user;      utf-8
    squarerootmethods.h;          us-ascii
    UpgradeLog.XML;               us-ascii

Answer 4

如果您的Windows机器上有“git”或“Cygwin”，请转到文件所在的文件夹并执行命令：

file *

这将为您提供该文件夹中所有文件的编码详细信息。

Answer 5

我觉得有用的另一个工具：https://archive.codeplex.com/?p=encodingchecker EXE可以找到here

Answer 6

这是我如何通过BOM检测Unicode系列文本编码。此方法的准确性很低，因为此方法仅适用于文本文件（特别是Unicode文件），并且在没有BOM时默认为ascii（与大多数文本编辑器一样，如果要匹配HTTP /，则默认为UTF8）网络生态系统）。

更新2018：我不再推荐这种方法。我推荐使用@Sybren和I show how to do that via PowerShell in a later answer推荐的GIT或* nix工具中的file.exe。

# from https://gist.github.com/zommarin/1480974
function Get-FileEncoding($Path) {
    $bytes = [byte[]](Get-Content $Path -Encoding byte -ReadCount 4 -TotalCount 4)

    if(!$bytes) { return 'utf8' }

    switch -regex ('{0:x2}{1:x2}{2:x2}{3:x2}' -f $bytes[0],$bytes[1],$bytes[2],$bytes[3]) {
        '^efbbbf'   { return 'utf8' }
        '^2b2f76'   { return 'utf7' }
        '^fffe'     { return 'unicode' }
        '^feff'     { return 'bigendianunicode' }
        '^0000feff' { return 'utf32' }
        default     { return 'ascii' }
    }
}

dir ~\Documents\WindowsPowershell -File | 
    select Name,@{Name='Encoding';Expression={Get-FileEncoding $_.FullName}} | 
    ft -AutoSize

建议：如果dir，ls或Get-ChildItem仅检查已知的文本文件，并且当您只从已知的工具列表中查找“错误编码”时，这可以合理地工作。（即SQL Management Studio默认为UTF16，它破坏了Windows的GIT auto-cr-lf，这是多年来的默认设置。）

Answer 7

我写了＃4答案（写作时）。但是最近我在我的所有计算机上安装了git，所以现在我使用@ Sybren的解决方案。这是一个新的答案，使得解决方案从powershell中得到了解决方案（没有将所有git / usr / bin放在PATH中，这对我来说太混乱了）。

将此添加到您的profile.ps1：

$global:gitbin = 'C:\Program Files\Git\usr\bin'
Set-Alias file.exe $gitbin\file.exe

使用像：file.exe --mime-encoding *。您必须在命令中包含.exe才能使用PS别名。

但是如果你没有自定义你的PowerShell profile.ps1我建议你从我的开始：https://gist.github.com/yzorg/8215221/8e38fd722a3dfc526bbe4668d1f3b08eb7c08be0并保存到~\Documents\WindowsPowerShell。在没有git的计算机上使用是安全的，但是在找不到git时会写警告。

命令中的.exe也是我如何使用powershell中的C:\WINDOWS\system32\where.exe;还有许多其他的OS CLI命令，由powershell“默认隐藏”，*耸肩*。

Answer 8

您可以使用名为Encoding Recognizer的免费实用程序（需要java）。你可以在http://mindprod.com/products2.html#ENCODINGRECOGNISER找到它

Answer 9

与上面使用记事本列出的解决方案类似，如果您正在使用它，也可以在Visual Studio中打开该文件。在Visual Studio中，您可以选择“文件>高级保存选项...”

“编码：”组合框将具体告诉您当前正在为该文件使用的编码。它有比Notepad更多的文本编码，因此在处理来自世界各地的各种文件以及其他任何文件时都很有用。

就像记事本一样，您也可以从那里的选项列表中更改编码，然后在点击“确定”后保存文件。您还可以通过“另存为”对话框中的“使用编码保存...”选项选择所需的编码（通过单击“保存”按钮旁边的箭头）。

Answer 10

我发现这样做的唯一方法是VIM或Notepad ++。

在Windows中获取文件的编码

问题描述投票：158回答：10

10个回答

最新问题

在Windows中获取文件的编码

问题描述 投票：158回答：10

10个回答

最新问题

问题描述投票：158回答：10