使用PowerShell以UTF-8编写文件而不使用BOM

问题描述 投票:208回答:15

Out-File似乎在使用UTF-8时强制BOM:

$MyFile = Get-Content $MyPath
$MyFile | Out-File -Encoding "UTF8" $MyPath

如何使用PowerShell以UTF-8编写没有BOM的文件?

encoding powershell utf-8 byte-order-mark
15个回答
205
投票

使用.NET的UTF8Encoding类并将$False传递给构造函数似乎工作:

$MyFile = Get-Content $MyPath
$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
[System.IO.File]::WriteAllLines($MyPath, $MyFile, $Utf8NoBomEncoding)

0
投票

通过扩展名将多个文件更改为UTF-8而不使用BOM:

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False)
foreach($i in ls -recurse -filter "*.java") {
    $MyFile = Get-Content $i.fullname 
    [System.IO.File]::WriteAllLines($i.fullname, $MyFile, $Utf8NoBomEncoding)
}

0
投票

无论出于何种原因,WriteAllLines调用仍然为我生成BOM,使用无BOM的UTF8Encoding参数,没有它。但以下对我有用:

$bytes = gc -Encoding byte BOMthetorpedoes.txt
[IO.File]::WriteAllBytes("$(pwd)\BOMthetorpedoes.txt", $bytes[3..($bytes.length-1)])

我必须使文件路径绝对让它工作。否则它将文件写入我的桌面。此外,我想这只有在您知道BOM为3个字节时才有效。我不知道基于编码期望给定的BOM格式/长度是多么可靠。

另外,正如所写,这可能只适用于你的文件适合PowerShell数组,它的长度限制似乎低于我机器上的[int32]::MaxValue


0
投票

如果你想使用[System.IO.File]::WriteAllLines(),你应该将第二个参数转换为String[](如果$MyFile的类型是Object[]),并且还使用$ExecutionContext.SessionState.Path.GetUnresolvedProviderPathFromPSPath($MyPath)指定绝对路径,如:

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
Get-ChildItem | ConvertTo-Csv | Set-Variable MyFile
[System.IO.File]::WriteAllLines($ExecutionContext.SessionState.Path.GetUnresolvedProviderPathFromPSPath($MyPath), [String[]]$MyFile, $Utf8NoBomEncoding)

如果你想使用[System.IO.File]::WriteAllText(),你有时应该将第二个参数传递给| Out-String |,以便明确地将CRLF添加到每一行的末尾(特别是当你将它们与ConvertTo-Csv一起使用时):

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
Get-ChildItem | ConvertTo-Csv | Out-String | Set-Variable tmp
[System.IO.File]::WriteAllText("/absolute/path/to/foobar.csv", $tmp, $Utf8NoBomEncoding)

或者你可以使用[Text.Encoding]::UTF8.GetBytes()Set-Content -Encoding Byte

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
Get-ChildItem | ConvertTo-Csv | Out-String | % { [Text.Encoding]::UTF8.GetBytes($_) } | Set-Content -Encoding Byte -Path "/absolute/path/to/foobar.csv"

见:How to write result of ConvertTo-Csv to a file in UTF-8 without BOM


-3
投票

这个适用于我(使用“默认”而不是“UTF8”):

$MyFile = Get-Content $MyPath
$MyFile | Out-File -Encoding "Default" $MyPath

结果是没有BOM的ASCII。


-3
投票

可以使用下面的内容获得没有BOM的UTF8

$MyFile | Out-File -Encoding ASCII

-3
投票

有同样的问题。这对我有用:

$MyFile | Out-File -Encoding Oem $MyPath

使用Visual Studio Code或Notepad ++打开文件时,它显示为UTF-8


68
投票

目前正确的方法是使用@Roman Kuzmin in comments推荐给@M的解决方案。 Dudley answer

[IO.File]::WriteAllLines($filename, $content)

(我还通过剥离不必要的System命名空间澄清来缩短它 - 它将默认自动替换。)


39
投票

我认为这不是UTF,但我发现了一个非常简单的解决方案似乎有效......

Get-Content path/to/file.ext | out-file -encoding ASCII targetFile.ext

对我来说,无论源格式如何,都会产生没有bom文件的utf-8。


26
投票

注意:此答案适用于Windows PowerShell;相比之下,在跨平台的PowerShell核心版中,没有BOM的UTF-8是默认编码。

补充M. Dudley's own simple and pragmatic answer(和ForNeVeR's more concise reformulation):

为方便起见,这里有高级功能Out-FileUtf8NoBom,一种模仿Out-File的基于管道的替代方案,意思是:

  • 你可以像管道中的Out-File一样使用它。
  • 非字符串的输入对象的格式与将它们发送到控制台时的格式相同,就像使用Out-File一样。

例:

(Get-Content $MyPath) | Out-FileUtf8NoBom $MyPath

请注意(Get-Content $MyPath)如何包含在(...)中,它确保在通过管道发送结果之前打开,读取并关闭整个文件。这是必要的,以便能够回写到同一个文件(在适当的位置更新)。 但是,一般情况下,这种技术不建议有两个原因:(a)整个文件必须适合内存;(b)如果命令中断,数据将丢失。

关于内存使用的说明:

  • M. Dudley's own answer要求首先在内存中构建整个文件内容,这对于大文件可能会有问题。
  • 下面的函数仅稍微改进了这一点:所有输入对象仍然首先被缓冲,但是它们的字符串表示然后被生成并逐个写入输出文件。

Out-FileUtf8NoBom的源代码(也可用as an MIT-licensed Gist):

<#
.SYNOPSIS
  Outputs to a UTF-8-encoded file *without a BOM* (byte-order mark).

.DESCRIPTION
  Mimics the most important aspects of Out-File:
  * Input objects are sent to Out-String first.
  * -Append allows you to append to an existing file, -NoClobber prevents
    overwriting of an existing file.
  * -Width allows you to specify the line width for the text representations
     of input objects that aren't strings.
  However, it is not a complete implementation of all Out-String parameters:
  * Only a literal output path is supported, and only as a parameter.
  * -Force is not supported.

  Caveat: *All* pipeline input is buffered before writing output starts,
          but the string representations are generated and written to the target
          file one by one.

.NOTES
  The raison d'être for this advanced function is that, as of PowerShell v5,
  Out-File still lacks the ability to write UTF-8 files without a BOM:
  using -Encoding UTF8 invariably prepends a BOM.

#>
function Out-FileUtf8NoBom {

  [CmdletBinding()]
  param(
    [Parameter(Mandatory, Position=0)] [string] $LiteralPath,
    [switch] $Append,
    [switch] $NoClobber,
    [AllowNull()] [int] $Width,
    [Parameter(ValueFromPipeline)] $InputObject
  )

  #requires -version 3

  # Make sure that the .NET framework sees the same working dir. as PS
  # and resolve the input path to a full path.
  [System.IO.Directory]::SetCurrentDirectory($PWD) # Caveat: .NET Core doesn't support [Environment]::CurrentDirectory
  $LiteralPath = [IO.Path]::GetFullPath($LiteralPath)

  # If -NoClobber was specified, throw an exception if the target file already
  # exists.
  if ($NoClobber -and (Test-Path $LiteralPath)) {
    Throw [IO.IOException] "The file '$LiteralPath' already exists."
  }

  # Create a StreamWriter object.
  # Note that we take advantage of the fact that the StreamWriter class by default:
  # - uses UTF-8 encoding
  # - without a BOM.
  $sw = New-Object IO.StreamWriter $LiteralPath, $Append

  $htOutStringArgs = @{}
  if ($Width) {
    $htOutStringArgs += @{ Width = $Width }
  }

  # Note: By not using begin / process / end blocks, we're effectively running
  #       in the end block, which means that all pipeline input has already
  #       been collected in automatic variable $Input.
  #       We must use this approach, because using | Out-String individually
  #       in each iteration of a process block would format each input object
  #       with an indvidual header.
  try {
    $Input | Out-String -Stream @htOutStringArgs | % { $sw.WriteLine($_) }
  } finally {
    $sw.Dispose()
  }

}

8
投票

使用Set-Content而不是Out-File时,可以指定编码Byte,它可用于将字节数组写入文件。这与不发出BOM的自定义UTF8编码相结合,可以得到所需的结果:

# This variable can be reused
$utf8 = New-Object System.Text.UTF8Encoding $false

$MyFile = Get-Content $MyPath -Raw
Set-Content -Value $utf8.GetBytes($MyFile) -Encoding Byte -Path $MyPath

使用[IO.File]::WriteAllLines()或类似的区别在于它应该适用于任何类型的项目和路径,而不仅仅是实际的文件路径。


7
投票

从版本6开始,powershell支持UTF8NoBOMset-contentout-file编码,甚至将其用作默认编码。

所以在上面的例子中它应该是这样的:

$MyFile | Out-File -Encoding UTF8NoBOM $MyPath

4
投票

此脚本将DIRECTORY1中的所有.txt文件转换为UTF-8而不使用BOM,并将它们输出到DIRECTORY2

foreach ($i in ls -name DIRECTORY1\*.txt)
{
    $file_content = Get-Content "DIRECTORY1\$i";
    [System.IO.File]::WriteAllLines("DIRECTORY2\$i", $file_content);
}

1
投票
    [System.IO.FileInfo] $file = Get-Item -Path $FilePath 
    $sequenceBOM = New-Object System.Byte[] 3 
    $reader = $file.OpenRead() 
    $bytesRead = $reader.Read($sequenceBOM, 0, 3) 
    $reader.Dispose() 
    #A UTF-8+BOM string will start with the three following bytes. Hex: 0xEF0xBB0xBF, Decimal: 239 187 191 
    if ($bytesRead -eq 3 -and $sequenceBOM[0] -eq 239 -and $sequenceBOM[1] -eq 187 -and $sequenceBOM[2] -eq 191) 
    { 
        $utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False) 
        [System.IO.File]::WriteAllLines($FilePath, (Get-Content $FilePath), $utf8NoBomEncoding) 
        Write-Host "Remove UTF-8 BOM successfully" 
    } 
    Else 
    { 
        Write-Warning "Not UTF-8 BOM file" 
    }  

来源How to remove UTF8 Byte Order Mark (BOM) from a file using PowerShell


0
投票

我使用的一种技术是使用Out-File cmdlet将输出重定向到ASCII文件。

例如,我经常运行SQL脚本,创建另一个在Oracle中执行的SQL脚本。使用简单的重定向(“>”),输出将为UTF-16,SQLPlus无法识别。要解决这个问题:

sqlplus -s / as sysdba "@create_sql_script.sql" |
Out-File -FilePath new_script.sql -Encoding ASCII -Force

然后可以通过另一个SQLPlus会话执行生成的脚本,而不需要任何Unicode担忧:

sqlplus / as sysdba "@new_script.sql" |
tee new_script.log
© www.soinside.com 2019 - 2024. All rights reserved.