我想建立一个简单的脚本来使用正则表达式,并在一行上匹配多个模式 - 递归整个输入文件,并将结果写入到输出文件。但我打墙:
示范文本:
BMC12345 COMBINED PHASE STATISTICS: 31 ROWS SELECTED FOR SPACE 'KDDT111D.DIH0345S', 0 ROWS SELECTED BUT DISCARDED DUE TBMC123456 COMBINED PHASE STATISTICS: 10 PHYSICAL (10 LOGICAL) RECORDS DISCARDED TO SYSDISC
下面是我到目前为止有:
$table = [regex] "'.*'"
$discard = [regex] "\d* PHYSICAL"
Select-String -Pattern ($table, $discard) -AllMatches .\test.txt | foreach {
$_.Matches.Value
} > output.txt
输出:
'KDDT111D.DIH0345S'
所需的输出:
'KDDT111D.DIH0345S' 10 Physical
出于某种原因,我无法获得两个模式写入output.txt的。理想的情况是,一旦我得到这个工作,我想用Export-Csv
得到的东西有点像清洁剂:
|KDDT111D|DIH0345S|10 Physical|
我想你会发现-match
运营商更适合这一点。 [坏笑]使用命名比赛对你的样品保存在$InStuff
,这...
$InStuff -match ".+SPACE '(?<Space>.+)\.(?<SubSpace>.+)'.+: (?<Discarded>.+) \(.+"
...给出了以下一组的比赛...
Name Value
---- -----
Space KDDT111D
SubSpace DIH0345S
Discarded 10 PHYSICAL
0 BMC12345 COMBINED PHASE STATISTICS: 31 ROWS SELECTED FOR SPACE 'KDDT111D.DIH0345...
指定的比赛可以通过$Matches.<the capture group name>
加以解决。
已运行到Select-String
限制:即.Matches
发出对每一输入对象(线)[Microsoft.PowerShell.Commands.MatchInfo]
对象的属性Select-String
永远只含有(潜在的多个),用于传递到所述第一正则表达式匹配
-Pattern
参数。[1]
您可以解决该问题通过传递一个正则表达式,而不是通过组合通过交替输入正则表达式(|
):
Select-String -Pattern ($table, $discard -join '|') -AllMatches .\test.txt |
ForEach-Object { $_.Matches.Value } > output.txt
一个简单的例子:
# ('f.', '.z' -join '|') -> 'f.|.z'
'foo bar baz' | Select-String -AllMatches ('f.', '.z' -join '|') |
ForEach-Object { $_.Matches.Value }
上述产率:
fo
az
证明了两个正则表达式比赛进行了报道。
需要注意的重新排序输出:采用交替(|
)导致在他们在输入中发现,没有在指定这些正则表达式的顺序的顺序来报告给定的输入字符串匹配。
也就是说,两个-Pattern 'f.|.z'
和-Pattern '.z|f.'
以上将导致相同的输出顺序。
[1]所述的问题存在如Windows PowerShell中V5.1 / PowerShell核心6.2.0-preview.4的和在this GitHub issue讨论
多亏了贡献者的思想和学习经验。我能得到利用组合的两个答案收到所需的输出。
我发现-match
操作只返回了从源文件中的正则表达式模式匹配的第一次出现,所以我需要为了整个日志文件,以递归返回匹配添加foreach
循环。
我还修改了正则表达式包括仅丢弃值大于0。
示范文本:
BMC51472I COMBINED PHASE STATISTICS: 0 ROWS SELECTED FOR SPACE 'KDDT000D.KDAICH0S', 0 ROWS SELECTED BUT DISCARDED DUE TOBMC51479I COMBINED PHASE STATISTICS: 0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS: 3499604 ROWS SELECTED FOR SPACE 'KDDT000D.KDAIND0S', 0 ROWS SELECTED BUT DISCARDED BMC51479I COMBINED PHASE STATISTICS: 0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS: 1 ROWS SELECTED FOR SPACE 'KDDT000D.KDCISR0S', 0 ROWS SELECTED BUT DISCARDED DUE TOBMC51479I COMBINED PHASE STATISTICS: 0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS: 9185775 ROWS SELECTED FOR SPACE 'KDDT000D.KDIADR0S', 0 ROWS SELECTED BUT DISCARDED BMC51479I COMBINED PHASE STATISTICS: 11 PHYSICAL (11 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS: 0 ROWS SELECTED FOR SPACE 'KDDT000D.KDICHT0S', 0 ROWS SELECTED BUT DISCARDED DUE TOBMC51479I COMBINED PHASE STATISTICS: 0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS: 2387375 ROWS SELECTED FOR SPACE 'KDDT000D.KDICMS0S', 0 ROWS SELECTED BUT DISCARDED BMC51479I COMBINED PHASE STATISTICS: 0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS: 1632821 ROWS SELECTED FOR SPACE 'KDDT000D.KDIPRV0S', 0 ROWS SELECTED BUT DISCARDED BMC51479I COMBINED PHASE STATISTICS: 0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS: 0 ROWS SELECTED FOR SPACE 'KDDT000D.KDLADD0S', 0 ROWS SELECTED BUT DISCARDED DUE TOBMC51479I COMBINED PHASE STATISTICS: 24845 PHYSICAL (24845 LOGICAL) RECORDS DISCARDED TO SYSDISC
例:
$regex = ".+SPACE '(?<Space>.+)\.(?<SubSpace>.+)'.+: (?<Discarded>.+) .[1-9][0-9]*\s\b"
$timestamp = Get-Date
$timestamp = Get-Date $timestamp -f "MM_dd_yy"
$dir = "C:\Users\JonMonJovi\"
cat $dir\*.log.txt | where {
$_ -match $regex
} | foreach {
$Matches.Space, $Matches.SubSpace, $Matches.Discarded -join "|"
} > C:\Users\JonMonJovi\Discarded\Discard_Log_$timestamp.txt
输出:
KDDT000D|KDIADR0S| 11 PHYSICAL
KDDT000D|KDLADD0S| 24845 PHYSICAL
从这里我可以使用分隔的.txt输出文件导入到Excel中的管道,满足我的要求。