使用windows cmd删除txt文件中的“NUL”字符

Question

我有一个包含数百万行的大型文本文件，我需要删除“NUL”字符（在 Notepad++ 中显示这种方式，参见图片）。搜索和替换在 Notepad++ 中可以工作，但需要很长时间。我想知道如何使用 Windows 命令删除这些 NUL 字符，这样可能会更快？

Answer 1

如果可以的话，我会使用 PowerShell 方法而不是 cmd 方法，这样会快得多。

在cmd中运行：

powershell -c "(Get-Content .\file.txt) -replace '\x00+', '' | Set-Content .\file.txt"

这对于 1GB 以上的文件可能会出现问题，因为它将文件加载到内存中，我建议在这里使用成熟的 PowerShell。

为了更快地实现它，您可以在 PowerShell 中使用 .NET 流：

#Open file.txt
$reader = [IO.File]::OpenText("file.txt")
#Save the output to file2.txt (can't save to the same files, as it is locked by StreamReader
$writer = New-Object System.IO.StreamWriter -ArgumentList ("file2.txt")

#loop over lines in file and replace char
while ($reader.Peek() -ge 0) {
    $line = $reader.ReadLine()
    #Replace null character with empty string
    $writer.WriteLine(($line.Replace('\0', "")))
}

#Close both streams
$reader.Close()
$writer.Close()

保存包含近 200 万行的 400MB 文件大约需要 6 秒

Answer 2

我发现如果每一行都有一个已知的文本字符串，您可以使用 FIND 命令（但不是 findstr）并将输出重定向回文件。 FIND 将“吃掉”字符串中的任何空字符。

因此，在您的数据中，假设每一行都以“.”结尾或包含“.”您可以使用的字符：寻找 ”。” inputfile.txt >outputfile.txt 以消除空值。

findstr 不起作用，因为它会打印输入中找到的字符串。

使用windows cmd删除txt文件中的“NUL”字符

问题描述投票：0回答：2

2个回答

最新问题

使用windows cmd删除txt文件中的“NUL”字符

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2