Out-File
似乎在使用UTF-8时强制BOM:
$MyFile = Get-Content $MyPath
$MyFile | Out-File -Encoding "UTF8" $MyPath
如何使用PowerShell以UTF-8编写没有BOM的文件?
使用.NET的UTF8Encoding
类并将$False
传递给构造函数似乎工作:
$MyFile = Get-Content $MyPath
$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
[System.IO.File]::WriteAllLines($MyPath, $MyFile, $Utf8NoBomEncoding)
通过扩展名将多个文件更改为UTF-8而不使用BOM:
$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False)
foreach($i in ls -recurse -filter "*.java") {
$MyFile = Get-Content $i.fullname
[System.IO.File]::WriteAllLines($i.fullname, $MyFile, $Utf8NoBomEncoding)
}
无论出于何种原因,WriteAllLines
调用仍然为我生成BOM,使用无BOM的UTF8Encoding
参数,没有它。但以下对我有用:
$bytes = gc -Encoding byte BOMthetorpedoes.txt
[IO.File]::WriteAllBytes("$(pwd)\BOMthetorpedoes.txt", $bytes[3..($bytes.length-1)])
我必须使文件路径绝对让它工作。否则它将文件写入我的桌面。此外,我想这只有在您知道BOM为3个字节时才有效。我不知道基于编码期望给定的BOM格式/长度是多么可靠。
另外,正如所写,这可能只适用于你的文件适合PowerShell数组,它的长度限制似乎低于我机器上的[int32]::MaxValue
。
如果你想使用[System.IO.File]::WriteAllLines()
,你应该将第二个参数转换为String[]
(如果$MyFile
的类型是Object[]
),并且还使用$ExecutionContext.SessionState.Path.GetUnresolvedProviderPathFromPSPath($MyPath)
指定绝对路径,如:
$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
Get-ChildItem | ConvertTo-Csv | Set-Variable MyFile
[System.IO.File]::WriteAllLines($ExecutionContext.SessionState.Path.GetUnresolvedProviderPathFromPSPath($MyPath), [String[]]$MyFile, $Utf8NoBomEncoding)
如果你想使用[System.IO.File]::WriteAllText()
,你有时应该将第二个参数传递给| Out-String |
,以便明确地将CRLF添加到每一行的末尾(特别是当你将它们与ConvertTo-Csv
一起使用时):
$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
Get-ChildItem | ConvertTo-Csv | Out-String | Set-Variable tmp
[System.IO.File]::WriteAllText("/absolute/path/to/foobar.csv", $tmp, $Utf8NoBomEncoding)
或者你可以使用[Text.Encoding]::UTF8.GetBytes()
和Set-Content -Encoding Byte
:
$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
Get-ChildItem | ConvertTo-Csv | Out-String | % { [Text.Encoding]::UTF8.GetBytes($_) } | Set-Content -Encoding Byte -Path "/absolute/path/to/foobar.csv"
见:How to write result of ConvertTo-Csv to a file in UTF-8 without BOM
这个适用于我(使用“默认”而不是“UTF8”):
$MyFile = Get-Content $MyPath
$MyFile | Out-File -Encoding "Default" $MyPath
结果是没有BOM的ASCII。
可以使用下面的内容获得没有BOM的UTF8
$MyFile | Out-File -Encoding ASCII
有同样的问题。这对我有用:
$MyFile | Out-File -Encoding Oem $MyPath
使用Visual Studio Code或Notepad ++打开文件时,它显示为UTF-8
目前正确的方法是使用@Roman Kuzmin in comments推荐给@M的解决方案。 Dudley answer:
[IO.File]::WriteAllLines($filename, $content)
(我还通过剥离不必要的System
命名空间澄清来缩短它 - 它将默认自动替换。)
我认为这不是UTF,但我发现了一个非常简单的解决方案似乎有效......
Get-Content path/to/file.ext | out-file -encoding ASCII targetFile.ext
对我来说,无论源格式如何,都会产生没有bom文件的utf-8。
注意:此答案适用于Windows PowerShell;相比之下,在跨平台的PowerShell核心版中,没有BOM的UTF-8是默认编码。
补充M. Dudley's own simple and pragmatic answer(和ForNeVeR's more concise reformulation):
为方便起见,这里有高级功能Out-FileUtf8NoBom
,一种模仿Out-File
的基于管道的替代方案,意思是:
Out-File
一样使用它。Out-File
一样。例:
(Get-Content $MyPath) | Out-FileUtf8NoBom $MyPath
请注意(Get-Content $MyPath)
如何包含在(...)
中,它确保在通过管道发送结果之前打开,读取并关闭整个文件。这是必要的,以便能够回写到同一个文件(在适当的位置更新)。
但是,一般情况下,这种技术不建议有两个原因:(a)整个文件必须适合内存;(b)如果命令中断,数据将丢失。
关于内存使用的说明:
Out-FileUtf8NoBom
的源代码(也可用as an MIT-licensed Gist):
<#
.SYNOPSIS
Outputs to a UTF-8-encoded file *without a BOM* (byte-order mark).
.DESCRIPTION
Mimics the most important aspects of Out-File:
* Input objects are sent to Out-String first.
* -Append allows you to append to an existing file, -NoClobber prevents
overwriting of an existing file.
* -Width allows you to specify the line width for the text representations
of input objects that aren't strings.
However, it is not a complete implementation of all Out-String parameters:
* Only a literal output path is supported, and only as a parameter.
* -Force is not supported.
Caveat: *All* pipeline input is buffered before writing output starts,
but the string representations are generated and written to the target
file one by one.
.NOTES
The raison d'être for this advanced function is that, as of PowerShell v5,
Out-File still lacks the ability to write UTF-8 files without a BOM:
using -Encoding UTF8 invariably prepends a BOM.
#>
function Out-FileUtf8NoBom {
[CmdletBinding()]
param(
[Parameter(Mandatory, Position=0)] [string] $LiteralPath,
[switch] $Append,
[switch] $NoClobber,
[AllowNull()] [int] $Width,
[Parameter(ValueFromPipeline)] $InputObject
)
#requires -version 3
# Make sure that the .NET framework sees the same working dir. as PS
# and resolve the input path to a full path.
[System.IO.Directory]::SetCurrentDirectory($PWD) # Caveat: .NET Core doesn't support [Environment]::CurrentDirectory
$LiteralPath = [IO.Path]::GetFullPath($LiteralPath)
# If -NoClobber was specified, throw an exception if the target file already
# exists.
if ($NoClobber -and (Test-Path $LiteralPath)) {
Throw [IO.IOException] "The file '$LiteralPath' already exists."
}
# Create a StreamWriter object.
# Note that we take advantage of the fact that the StreamWriter class by default:
# - uses UTF-8 encoding
# - without a BOM.
$sw = New-Object IO.StreamWriter $LiteralPath, $Append
$htOutStringArgs = @{}
if ($Width) {
$htOutStringArgs += @{ Width = $Width }
}
# Note: By not using begin / process / end blocks, we're effectively running
# in the end block, which means that all pipeline input has already
# been collected in automatic variable $Input.
# We must use this approach, because using | Out-String individually
# in each iteration of a process block would format each input object
# with an indvidual header.
try {
$Input | Out-String -Stream @htOutStringArgs | % { $sw.WriteLine($_) }
} finally {
$sw.Dispose()
}
}
使用Set-Content
而不是Out-File
时,可以指定编码Byte
,它可用于将字节数组写入文件。这与不发出BOM的自定义UTF8编码相结合,可以得到所需的结果:
# This variable can be reused
$utf8 = New-Object System.Text.UTF8Encoding $false
$MyFile = Get-Content $MyPath -Raw
Set-Content -Value $utf8.GetBytes($MyFile) -Encoding Byte -Path $MyPath
使用[IO.File]::WriteAllLines()
或类似的区别在于它应该适用于任何类型的项目和路径,而不仅仅是实际的文件路径。
从版本6开始,powershell支持UTF8NoBOM
和set-content的out-file编码,甚至将其用作默认编码。
所以在上面的例子中它应该是这样的:
$MyFile | Out-File -Encoding UTF8NoBOM $MyPath
此脚本将DIRECTORY1中的所有.txt文件转换为UTF-8而不使用BOM,并将它们输出到DIRECTORY2
foreach ($i in ls -name DIRECTORY1\*.txt)
{
$file_content = Get-Content "DIRECTORY1\$i";
[System.IO.File]::WriteAllLines("DIRECTORY2\$i", $file_content);
}
[System.IO.FileInfo] $file = Get-Item -Path $FilePath
$sequenceBOM = New-Object System.Byte[] 3
$reader = $file.OpenRead()
$bytesRead = $reader.Read($sequenceBOM, 0, 3)
$reader.Dispose()
#A UTF-8+BOM string will start with the three following bytes. Hex: 0xEF0xBB0xBF, Decimal: 239 187 191
if ($bytesRead -eq 3 -and $sequenceBOM[0] -eq 239 -and $sequenceBOM[1] -eq 187 -and $sequenceBOM[2] -eq 191)
{
$utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False)
[System.IO.File]::WriteAllLines($FilePath, (Get-Content $FilePath), $utf8NoBomEncoding)
Write-Host "Remove UTF-8 BOM successfully"
}
Else
{
Write-Warning "Not UTF-8 BOM file"
}
来源How to remove UTF8 Byte Order Mark (BOM) from a file using PowerShell
我使用的一种技术是使用Out-File cmdlet将输出重定向到ASCII文件。
例如,我经常运行SQL脚本,创建另一个在Oracle中执行的SQL脚本。使用简单的重定向(“>”),输出将为UTF-16,SQLPlus无法识别。要解决这个问题:
sqlplus -s / as sysdba "@create_sql_script.sql" |
Out-File -FilePath new_script.sql -Encoding ASCII -Force
然后可以通过另一个SQLPlus会话执行生成的脚本,而不需要任何Unicode担忧:
sqlplus / as sysdba "@new_script.sql" |
tee new_script.log