如何删除具有相似名称的重复文件

问题描述 投票:1回答:3

我对PowerShell还是很陌生,但是我无法为我的问题找到确切的答案。我在不同的文件夹中有一堆Excel文件,这些文件是重复的,但由于它们被更新,因此文件名不同。例如015认可的保固-土耳其-Case-2019 08-1437015(issue 3),015批准的保修-土耳其-案例-2019 08-1437015(最终发行)015认可的保固-土耳其-Case-2019 08-1437015015批准的保修-土耳其-案例-2019 08-1437015修订

我已经尝试了不同的方法,但是现在我知道了过滤文件的最简单方法,但是却不知道语法。锚点将是日期之后的案例编号。我想将案例编号相互比较,仅保留最新的案例(按修改日期),然后删除其余的案例。任何指导表示赞赏。

#take files from folder
$dupesource = 'C:\Users\W_Brooker\Documents\Destination\2019\08'

#filter files by case number (7 digit number after date)
$files = Get-ChildItem $dupesource -Filter "08-aaaaaaa"

#If case number is the same keep newest file delete rest
foreach ($file in $files){
$file | Delete-Item - sort -property Datemodified |select -Last 1
}
powershell file duplicates delete-file
3个回答
0
投票

这应该可以解决问题:

    $files = Get-ChildItem 'C:\Users\W_Brooker\Documents\Destination\2019\08' -Recurse

    # create datatable to store file Information in it

    $dt = New-Object system.Data.DataTable
    [void]$dt.Columns.Add('FileName',[string]::Empty.GetType() )
    [void]$dt.Columns.Add('CaseNumber',[string]::Empty.GetType() )
    [void]$dt.Columns.Add('FileTimeStamp',[DateTime]::MinValue.GetType() )
    [void]$dt.Columns.Add('DeleteFlag',[byte]::MinValue.GetType() )

    # Step 1: Make inventory

    foreach( $file in $files ) {

    if( !$file.PSIsContainer -and $file.Extension -like '.xls*' -and $file.Name -match '^.*\-\d+ *[\(\.].*$' ) {

        $row               = $dt.NewRow()
        $row.FileName      = $file.FullName
        $row.CaseNumber    = $file.Name -replace '^.*\-(\d+) *[\(\.].*$', '$1'
        $row.FileTimeStamp = $file.LastWriteTime
        $row.DeleteFlag    = 0

        [void]$dt.Rows.Add( $row )
    }
}

# Step 2: Mark files to delete

$rows = $dt.Select('', 'CaseNumber, FileTimeStamp DESC')

$caseNumber = ''

foreach( $row in $rows ) {
    if( $row.CaseNumber -ne $caseNumber ) {
        $caseNumber = $row.CaseNumber
        Continue
    }
    $row.DeleteFlag = 1
    [void]$dt.AcceptChanges()
}

# Step 3: Delete files

$rows = $dt.Select('DeleteFlag = 1', 'FileTimeStamp DESC')

foreach( $row in $rows ) {
    $fileName = $row.FileName
    Remove-Item -Path $fileName -Force | Out-Null
}

0
投票

这里是利用PowerShell Group-Object cmdlet的替代方法。

它使用正则表达式匹配案例编号上的文件,而忽略没有案例编号的文件。请参阅底部显示测试数据的屏幕截图(测试xlsx文件的集合)

cls

#Assume that each file has an xlsx extension.
#Assume that a case number always looks like this: "Case-YYYY~XX-Z" where YYYY is 4 digits, ~ is a single space, XX is two digits, and Z is one-to-many-digits

#make a list of xlsx files (recursive)
$files = Get-ChildItem -LiteralPath .\ExcelFiles -Recurse -Include *.xlsx 

#$file is a System.IO.FileInfo object. Parse out the Case number and add it to the $file object as CaseNumber property
foreach ($file in $files)
{
    $Matches = $null

    $file.Name -match "(^.*)(Case-\d{4}\s{1}\d{2}-\d{1,})(.*\.xlsx$)" | out-null

    if ($Matches.Count -eq 4)
    {
        $caseNumber = $Matches[2]
        $file | Add-Member -NotePropertyName CaseNumber -NotePropertyValue $caseNumber
    }
    Else
    {
        #child folders will end up in this group too
        $file | Add-Member -NotePropertyName CaseNumber -NotePropertyValue "NoCaseNumber"
    }
}

#group the files by CaseNumber
$files | Group-Object -Property CaseNumber -OutVariable fileGroups | out-null

foreach ($fileGroup in $fileGroups)
{
    #skip folders and files that don't have a valid case #
    if ($fileGroup.Name -eq "NoCaseNumber")
    {
        continue
    }

    #for each group: sort files descending by LastWriteTime. Newest file will be first, so skip 1st file and remove the rest
    $fileGroup.Group | sort -Descending -Property LastWriteTime | select -skip 1 | foreach {Remove-Item -LiteralPath $_.FullName -Force}
}

测试数据

enter image description here


0
投票

PowerShell惯用的解决方案是:

  • 在单个管道中组合多个cmdlet,

  • 其中Group-Object提供了通过文件名中的共享案例编号将重复文件分组的核心功能:

# Define the regex that matches a case number:
# A 7-digit number embedded in filenames that duplicates share.
$regex = '\b\d{7}\b' 

# Enumerate all files and select only those whose name contains a case number.
Get-ChildItem -File $dupesource | Where-Object { $_.BaseName -match $regex } | 
  # Group the resulting files by shared embedded case number.
  Group-Object -Property { [regex]::Match($_.BaseName, $regex).Value } |
    # Process each group:
    ForEach-Object {
      # In each group, sort files by most recently updated first.
      $_.Group | Sort-Object -Descending LastWriteTimeUtc |
        # Skip the most recent file and delete the older ones.
        Select-Object -Skip 1 | Remove-Item -WhatIf
    }

-WhatIf common parameter 预览操作。确定可以完成所需操作后,将其删除。

© www.soinside.com 2019 - 2024. All rights reserved.