我对PowerShell还是很陌生,但是我无法为我的问题找到确切的答案。我在不同的文件夹中有一堆Excel文件,这些文件是重复的,但由于它们被更新,因此文件名不同。例如015认可的保固-土耳其-Case-2019 08-1437015(issue 3),015批准的保修-土耳其-案例-2019 08-1437015(最终发行)015认可的保固-土耳其-Case-2019 08-1437015015批准的保修-土耳其-案例-2019 08-1437015修订
我已经尝试了不同的方法,但是现在我知道了过滤文件的最简单方法,但是却不知道语法。锚点将是日期之后的案例编号。我想将案例编号相互比较,仅保留最新的案例(按修改日期),然后删除其余的案例。任何指导表示赞赏。
#take files from folder
$dupesource = 'C:\Users\W_Brooker\Documents\Destination\2019\08'
#filter files by case number (7 digit number after date)
$files = Get-ChildItem $dupesource -Filter "08-aaaaaaa"
#If case number is the same keep newest file delete rest
foreach ($file in $files){
$file | Delete-Item - sort -property Datemodified |select -Last 1
}
这应该可以解决问题:
$files = Get-ChildItem 'C:\Users\W_Brooker\Documents\Destination\2019\08' -Recurse
# create datatable to store file Information in it
$dt = New-Object system.Data.DataTable
[void]$dt.Columns.Add('FileName',[string]::Empty.GetType() )
[void]$dt.Columns.Add('CaseNumber',[string]::Empty.GetType() )
[void]$dt.Columns.Add('FileTimeStamp',[DateTime]::MinValue.GetType() )
[void]$dt.Columns.Add('DeleteFlag',[byte]::MinValue.GetType() )
# Step 1: Make inventory
foreach( $file in $files ) {
if( !$file.PSIsContainer -and $file.Extension -like '.xls*' -and $file.Name -match '^.*\-\d+ *[\(\.].*$' ) {
$row = $dt.NewRow()
$row.FileName = $file.FullName
$row.CaseNumber = $file.Name -replace '^.*\-(\d+) *[\(\.].*$', '$1'
$row.FileTimeStamp = $file.LastWriteTime
$row.DeleteFlag = 0
[void]$dt.Rows.Add( $row )
}
}
# Step 2: Mark files to delete
$rows = $dt.Select('', 'CaseNumber, FileTimeStamp DESC')
$caseNumber = ''
foreach( $row in $rows ) {
if( $row.CaseNumber -ne $caseNumber ) {
$caseNumber = $row.CaseNumber
Continue
}
$row.DeleteFlag = 1
[void]$dt.AcceptChanges()
}
# Step 3: Delete files
$rows = $dt.Select('DeleteFlag = 1', 'FileTimeStamp DESC')
foreach( $row in $rows ) {
$fileName = $row.FileName
Remove-Item -Path $fileName -Force | Out-Null
}
这里是利用PowerShell Group-Object cmdlet的替代方法。
它使用正则表达式匹配案例编号上的文件,而忽略没有案例编号的文件。请参阅底部显示测试数据的屏幕截图(测试xlsx文件的集合)
cls
#Assume that each file has an xlsx extension.
#Assume that a case number always looks like this: "Case-YYYY~XX-Z" where YYYY is 4 digits, ~ is a single space, XX is two digits, and Z is one-to-many-digits
#make a list of xlsx files (recursive)
$files = Get-ChildItem -LiteralPath .\ExcelFiles -Recurse -Include *.xlsx
#$file is a System.IO.FileInfo object. Parse out the Case number and add it to the $file object as CaseNumber property
foreach ($file in $files)
{
$Matches = $null
$file.Name -match "(^.*)(Case-\d{4}\s{1}\d{2}-\d{1,})(.*\.xlsx$)" | out-null
if ($Matches.Count -eq 4)
{
$caseNumber = $Matches[2]
$file | Add-Member -NotePropertyName CaseNumber -NotePropertyValue $caseNumber
}
Else
{
#child folders will end up in this group too
$file | Add-Member -NotePropertyName CaseNumber -NotePropertyValue "NoCaseNumber"
}
}
#group the files by CaseNumber
$files | Group-Object -Property CaseNumber -OutVariable fileGroups | out-null
foreach ($fileGroup in $fileGroups)
{
#skip folders and files that don't have a valid case #
if ($fileGroup.Name -eq "NoCaseNumber")
{
continue
}
#for each group: sort files descending by LastWriteTime. Newest file will be first, so skip 1st file and remove the rest
$fileGroup.Group | sort -Descending -Property LastWriteTime | select -skip 1 | foreach {Remove-Item -LiteralPath $_.FullName -Force}
}
测试数据
PowerShell惯用的解决方案是:
在单个管道中组合多个cmdlet,
其中Group-Object
提供了通过文件名中的共享案例编号将重复文件分组的核心功能:
# Define the regex that matches a case number:
# A 7-digit number embedded in filenames that duplicates share.
$regex = '\b\d{7}\b'
# Enumerate all files and select only those whose name contains a case number.
Get-ChildItem -File $dupesource | Where-Object { $_.BaseName -match $regex } |
# Group the resulting files by shared embedded case number.
Group-Object -Property { [regex]::Match($_.BaseName, $regex).Value } |
# Process each group:
ForEach-Object {
# In each group, sort files by most recently updated first.
$_.Group | Sort-Object -Descending LastWriteTimeUtc |
# Skip the most recent file and delete the older ones.
Select-Object -Skip 1 | Remove-Item -WhatIf
}
-WhatIf
common parameter 预览操作。确定可以完成所需操作后,将其删除。