Group-Object

问题描述 投票:0回答:1

如果可能的话,我需要在一个路径中删除多个文本文件中的重复行,在powerhell中。

我已经找到了一种方法来获取行的列表。

Get-Content "$path\*.*" | Group-Object | Where-Object { $_.Count -gt 1 } | Select -ExpandProperty Name

现在我认为 foreach 循环会很有用,但是,我不知道如何处理原地的删除动作......

谁能帮帮我,好吗?

EDIT:由于避免误解,我把问题的标题改了一下!

EDIT 2(根据Olaf提示)。

PS C:\Users\Robbi> $mypath = "F:\DATA\Urls_CP"
PS C:\Users\Robbi> Get-ChildItem -Path $mypath -Filter * |
>>     ForEach-Object{
>>         $Content =
>>         Get-Content -Path $_.FullName | Sort-Object -Unique
>>         $Content | Out-File -FilePath $_.FullName
>>     }

PS C:\Users\Robbi> Get-Content $mypath\* | Select-String "https://httpd.apache.org/docs/2.4/mod/mod_md.html"

https://httpd.apache.org/docs/2.4/mod/mod_md.html
https://httpd.apache.org/docs/2.4/mod/mod_md.html

但有些东西发生了变化,我把原来的 "Urls "文件夹复制了,在复制的 "Urls_CP "文件夹上运行了你的代码;"Urls_CP "的大小比原来的 "Urls "大了大约200kb

我想说的是,每一个文件都是用Powerhell操作的,来自linux vm的Squid代理的 "access.log",但是我已经用notepad++检查了编码和 "奇怪 "字符的存在。(我没有访问linux shell的权限)

这是对 "Urls "文件夹内的一个文件的提取。

https://community.checkpoint.com/t5/API-CLI-Discussion-and-Samples/can-anybody-let-me-know-how-can-we-import-policy-rules-via-csv/td-p/20839
https://community.checkpoint.com/t5/API-CLI-Discussion-and-Samples/Python-tool-for-exporting-importing-a-policy-package-or-parts-of/td-p/41100
https://community.checkpoint.com/t5/General-Management-Topics/R80-10-API-bug-fallback-to-quot-SmartCenter-Only-quot-after/m-p/5074
https://github.com/CheckPointSW/cp_mgmt_api_python_sdk
https://github.com/CheckPointSW/cpAnsible/issues/2
https://github.com/CheckPointSW/ExportImportPolicyPackage/issues
https://stackoverflow.com/questions/15031694/installing-python-packages-from-local-file-system-folder-to-virtualenv-with-pip
https://stackoverflow.com/questions/24627525/fatal-error-in-launcher-unable-to-create-process-using-c-program-files-x86
https://stackoverflow.com/questions/25749621/whats-the-difference-between-pip-install-and-python-m-pip-install
https://stackoverflow.com/questions/42494229/how-to-pip-install-a-local-python-package

EDIT 3:

请原谅我,我会尽量解释得更好!

我将保持 "Urls "文件夹的结构,其中包含多个文件;我将 "在所有文件的基础上 "删除(或用"$ null "代替)重复的文件,但保留文件夹中的每个文件,即:不是一个大文件,里面有所有的http地址!在EDIT 2中,我已经向Olaf展示了字符串 "Urls"。"https://httpd.apache.org/docs/2.4/mod/mod_md.html" 仍然是重复的,因为它存在于 "$mypath\file1.txt" 而在档案中 "$mypath\file512.txt"我已经明白Olaf的代码检查重复的 "每一个文件的基础上"(感谢@Lee_Dailey,我已经得到了wath是不清楚我的问题!)。

EDIT 4:

$SourcePath = 'F:\DATA\Urls_CP'
$TargetPath = 'F:\DATA\Urls_CP\DeDupe'

$UrlList = Get-ChildItem -Path $SourcePath -Filter *.txt |
    ForEach-Object {
        $FileName = $_.BaseName
        $FileLWT = (Get-ItemProperty $_.FullName).LastWriteTime
        Get-Content -Path $_.FullName -Encoding default |
            ForEach-Object {
                [PSCustomObject]@{
                    URL = $_
                    File = $FileName
                    LWT = $FileLWT
                }
            }
    }

$UrlList | 
    Sort-Object -Property URL -Unique |
        ForEach-Object {
            $TargetFile = Join-Path -Path $TargetPath -ChildPath ($_.File + '.txt')
            $_.URL | Out-File -FilePath $TargetFile -Append -Encoding default
            Set-ItemProperty $TargetFile -Name LastWriteTime -Value $_.LWT
        }
powershell text-files exec data-manipulation in-place
1个回答
© www.soinside.com 2019 - 2024. All rights reserved.