如何使用搜索和替换易于阅读且易于添加/修改/删除的字符串来查找和替换Windows文本文件中的文本。此脚本将解析6800行文件,查找70个字符串实例,重新编号并在不到400毫秒内覆盖原始字符串。
搜索字符串“AROUND LINE {1-9999}”和“LINE2 {1-9999}”,并将{1-9999}替换为代码所在的{line number}。琴弦周围有一个前后空格。最后两个测试使用整个源批处理副本完成并粘贴到sample.bat中。
sample.bat包含两行:
ECHO AROUND LINE 5936
TITLE %TIME% DISPLAY TCP-IP SETTINGS LINE2 5937
当前代码包括寻找AROUND LINE和@ mklement0解决方案:
copy-item $env:temp\sample.bat -d $env:temp\sample.bat.$((get-date).tostring("HHmmss"))
$file = "$env:temp\sample.bat"
$lc = 0
$updatedLines = switch -Regex ([IO.File]::ReadAllLines($file)) {
'^(.*? (?:AROUND LINE|LINE2) )\d+(.*)$' { $Matches[1] + ++$lc + $Matches[2] }
default { ++$lc; $_ }
}
[IO.File]::WriteAllLines($file, $updatedLines, [Text.Encoding]::ASCII)
预期成绩:
ECHO AROUND LINE 1
TITLE %TIME% DISPLAY TCP-IP SETTINGS LINE2 2
实际结果:
ECHO AROUND LINE 1
TITLE %TIME% DISPLAY TCP-IP SETTINGS LINE2 2
使用switch,.NET框架和粘贴到sample.bat中的整个批处理文件进行测量:
Measure-command {
copy-item $env:temp\sample.bat -d $env:temp\sample.bat.$((get-date).tostring("HHmmss"))
$file = "$env:temp\sample.bat"
$lc = 0
$updatedLines = switch -Regex ([IO.File]::ReadAllLines($file)) {
'^(.*? (?:AROUND LINE|LINE2) )\d+(.*)$' { $Matches[1] + ++$lc + $Matches[2] }
default { ++$lc; $_ }
}
[IO.File]::WriteAllLines($file, $updatedLines, [Text.Encoding]::ASCII)}
结果:10次运行75ms-386ms。
使用Get-Content + -replace + Set-Content和粘贴到sample.bat中的整个批处理文件进行测量:
Measure-command {
copy-item $env:temp\sample.bat -d $env:temp\sample.bat.$((get-date).tostring("HHmmss"))
(gc $env:temp\sample.bat) | foreach -Begin {$lc = 1} -Process {
$_ -replace 'AROUND LINE \d+', "AROUND LINE $lc" -replace 'LINE2 \d+', "LINE2 $lc"
++$lc
} | sc -Encoding Ascii $env:temp\sample.bat}
结果:10次运行363ms-451ms。
搜索字符串是一个易于理解的正则表达式。
您可以通过添加另一个-replace来搜索其他字符串。
-replace 'AROUND LINE \d+', "AROUND LINE $lc" -replace 'LINE2 \d+', "LINE2 $lc" -replace 'LINE3 \d+', "LINE3 $lc"
这个问题从最年轻到最老的演变:1。54757890 2. 54737787 3. 54712715 4. 54682186
更新:我使用了@ mklement0正则表达式解决方案。
switch -Regex -File $file {
'^(.*? (?:AROUND LINE|LINE2) )\d+(.*)$' { $Matches[1] + ++$lc + $Matches[2] }
default { ++$lc; $_ }
}
^(.*? (?:AROUND LINE|LINE2) )\d+(.*)$
仅包含2个捕获组 - 要替换的数字(\d+
)之前的行的部分以及之后的行的部分,您必须将这些具有索引1
和2
的组引用到输出中的automatic $Matches
variable(而不是2
)和3
)。
请注意,(?:...)
是一个非捕获组,所以在设计中它没有反映在$Matches
中。[IO.File]::ReadAllLines($file)
读取文件,而是使用-File
选项和switch
,它直接读取文件$file
中的行。++$lc
中的default { ++$lc; $_ }
确保在通过手头的线($_
)之前,线计数器也会针对非匹配线递增。# Enclose the switch statement in & { ... } to speed it up slightly.
$updatedLines = & { switch -Regex -File ... }
[regex]
实例而不是PowerShell在幕后转换为正则表达式的字符串文字可以进一步加快速度 - 请参阅下面的基准测试。-CaseSensitive
选项添加到switch
语句来挤出更多性能。switch -File
来处理这些行,并且通常使用.NET类型来处理文件I / O(而不是cmdlet)(在这种情况下为IO.File]::WriteAllLines()
,如图所示)问题) - 另见this related answer。
也就是说,marsze's answer提供了一种高度优化的foreach
循环方法,该方法基于预编译的正则表达式,迭代次数越多,速度越快 - 但是,它更加冗长。switch
方法与marsze的foreach
方法的表现。& { ... }
优化也被添加到switch
命令中。
IgnoreCase
和CultureInvariant
选项被添加到foreach
方法中以匹配PS正则表达式隐式使用的选项。而不是6行样本文件,性能分别用600行,3,000和30,000行文件进行测试,以显示迭代次数对性能的影响。
正在平均100次运行。
来自运行Windows PowerShell v5.1的Windows 10计算机的示例结果 - 绝对时间并不重要,但希望Factor
列中显示的相对性能通常具有代表性:
VERBOSE: Averaging 100 runs with a 600-line file of size 0.03 MB...
Factor Secs (100-run avg.) Command
------ ------------------- -------
1.00 0.023 # switch -Regex -File with regex string literal...
1.16 0.027 # foreach with precompiled regex and [regex].Match...
1.23 0.028 # switch -Regex -File with precompiled regex...
VERBOSE: Averaging 100 runs with a 3000-line file of size 0.15 MB...
Factor Secs (100-run avg.) Command
------ ------------------- -------
1.00 0.063 # foreach with precompiled regex and [regex].Match...
1.11 0.070 # switch -Regex -File with precompiled regex...
1.15 0.073 # switch -Regex -File with regex string literal...
VERBOSE: Averaging 100 runs with a 30000-line file of size 1.47 MB...
Factor Secs (100-run avg.) Command
------ ------------------- -------
1.00 0.252 # foreach with precompiled regex and [regex].Match...
1.24 0.313 # switch -Regex -File with precompiled regex...
1.53 0.386 # switch -Regex -File with regex string literal...
请注意,在较低的迭代次数下,switch -regex
的字符串文字是最快的,但在大约1,500行时,带有预编译的foreach
实例的[regex]
解决方案开始变得更快;使用[regex]
预编译的switch -regex
实例的回报程度较低,只有较高的迭代次数。
基准代码,使用Time-Command
function:
# Sample file content (6 lines)
$fileContent = @'
TITLE %TIME% NO "%zmyapps1%\*.*" ARCHIVE ATTRIBUTE LINE2 1243
TITLE %TIME% DOC/SET YQJ8 LINE2 1887
SET ztitle=%TIME%: WINFOLD LINE2 2557
TITLE %TIME% _*.* IN WINFOLD LINE2 2597
TITLE %TIME% %%ZDATE1%% YQJ25 LINE2 3672
TITLE %TIME% FINISHED. PRESS ANY KEY TO SHUTDOWN ... LINE2 4922
'@
# Determine the full path to a sample file.
# NOTE: Using the *full* path is a *must* when calling .NET methods, because
# the latter generally don't see the same working dir. as PowerShell.
$file = "$PWD/test.bat"
# Note: input is the number of 6-line blocks to write to the sample file,
# which amounts to 600 vs. 3,000 vs. 30,0000 lines.
100, 500, 5000 | % {
# Create the sample file with the sample content repeated N times.
$repeatCount = $_
[IO.File]::WriteAllText($file, $fileContent * $repeatCount)
# Warm up the file cache and count the lines.
$lineCount = [IO.File]::ReadAllLines($file).Count
# Define the commands to compare as an array of scriptblocks.
$commands =
{ # switch -Regex -File with regex string literal
& {
$i = 0
$updatedLines = switch -Regex -File $file {
'^(.*? (?:AROUND LINE|LINE2) )\d+(.*)$' { $Matches[1] + ++$i + $Matches[2] }
default { ++$i; $_ }
}
[IO.File]::WriteAllLines($file, $updatedLines, [text.encoding]::ASCII)
}
}, { # switch -Regex -File with precompiled regex
& {
$i = 0
$regex = [Regex]::new('^(.*? (?:AROUND LINE|LINE2) )\d+(.*)$', 'Compiled, IgnoreCase, CultureInvariant')
$updatedLines = switch -Regex -File $file {
$regex { $Matches[1] + ++$i + $Matches[2] }
default { ++$i; $_ }
}
[IO.File]::WriteAllLines($file, $updatedLines, [text.encoding]::ASCII)
}
}, { # foreach with precompiled regex and [regex].Match
& {
$regex = [Regex]::new('^(.*? (?:AROUND LINE|LINE2) )\d+(.*)$', 'Compiled, IgnoreCase, CultureInvariant')
$i = 0
$updatedLines = foreach ($line in [IO.File]::ReadLines($file)) {
$i++
$m = $regex.Match($line)
if ($m.Success) {
$g = $m.Groups
$g[1].Value + $i + $g[2].Value
} else { $line }
}
[IO.File]::WriteAllLines($file, $updatedLines, [Text.Encoding]::ASCII)
}
}
# How many runs to average.
$runs = 100
Write-Verbose -vb "Averaging $runs runs with a $lineCount-line file of size $('{0:N2} MB' -f ((Get-Item $file).Length / 1mb))..."
Time-Command -Count $runs -ScriptBlock $commands | Out-Host
}
替代方案:
$regex = [Regex]::new('^(.*? (?:AROUND LINE|LINE2) )\d+(.*)$', 'Compiled, IgnoreCase, CultureInvariant')
$lc = 0
$updatedLines = & {foreach ($line in [IO.File]::ReadLines($file)) {
$lc++
$m = $regex.Match($line)
if ($m.Success) {
$g = $m.Groups
$g[1].Value + $lc + $g[2].Value
} else { $line }
}}
[IO.File]::WriteAllLines($file, $updatedLines, [Text.Encoding]::ASCII)