如果可能的话,我需要在一个路径中删除多个文本文件中的重复行,在powerhell中。
我已经找到了一种方法来获取行的列表。
Get-Content "$path\*.*" | Group-Object | Where-Object { $_.Count -gt 1 } | Select -ExpandProperty Name
现在我认为 foreach
循环会很有用,但是,我不知道如何处理原地的删除动作......
谁能帮帮我,好吗?
EDIT:由于避免误解,我把问题的标题改了一下!
EDIT 2(根据Olaf提示)。
PS C:\Users\Robbi> $mypath = "F:\DATA\Urls_CP"
PS C:\Users\Robbi> Get-ChildItem -Path $mypath -Filter * |
>> ForEach-Object{
>> $Content =
>> Get-Content -Path $_.FullName | Sort-Object -Unique
>> $Content | Out-File -FilePath $_.FullName
>> }
PS C:\Users\Robbi> Get-Content $mypath\* | Select-String "https://httpd.apache.org/docs/2.4/mod/mod_md.html"
https://httpd.apache.org/docs/2.4/mod/mod_md.html
https://httpd.apache.org/docs/2.4/mod/mod_md.html
但有些东西发生了变化,我把原来的 "Urls "文件夹复制了,在复制的 "Urls_CP "文件夹上运行了你的代码;"Urls_CP "的大小比原来的 "Urls "大了大约200kb
我想说的是,每一个文件都是用Powerhell操作的,来自linux vm的Squid代理的 "access.log",但是我已经用notepad++检查了编码和 "奇怪 "字符的存在。(我没有访问linux shell的权限)
这是对 "Urls "文件夹内的一个文件的提取。
https://community.checkpoint.com/t5/API-CLI-Discussion-and-Samples/can-anybody-let-me-know-how-can-we-import-policy-rules-via-csv/td-p/20839
https://community.checkpoint.com/t5/API-CLI-Discussion-and-Samples/Python-tool-for-exporting-importing-a-policy-package-or-parts-of/td-p/41100
https://community.checkpoint.com/t5/General-Management-Topics/R80-10-API-bug-fallback-to-quot-SmartCenter-Only-quot-after/m-p/5074
https://github.com/CheckPointSW/cp_mgmt_api_python_sdk
https://github.com/CheckPointSW/cpAnsible/issues/2
https://github.com/CheckPointSW/ExportImportPolicyPackage/issues
https://stackoverflow.com/questions/15031694/installing-python-packages-from-local-file-system-folder-to-virtualenv-with-pip
https://stackoverflow.com/questions/24627525/fatal-error-in-launcher-unable-to-create-process-using-c-program-files-x86
https://stackoverflow.com/questions/25749621/whats-the-difference-between-pip-install-and-python-m-pip-install
https://stackoverflow.com/questions/42494229/how-to-pip-install-a-local-python-package
EDIT 3:
请原谅我,我会尽量解释得更好!
我将保持 "Urls "文件夹的结构,其中包含多个文件;我将 "在所有文件的基础上 "删除(或用"$ null "代替)重复的文件,但保留文件夹中的每个文件,即:不是一个大文件,里面有所有的http地址!在EDIT 2中,我已经向Olaf展示了字符串 "Urls"。"https://httpd.apache.org/docs/2.4/mod/mod_md.html"
仍然是重复的,因为它存在于 "$mypath\file1.txt"
而在档案中 "$mypath\file512.txt"
我已经明白Olaf的代码检查重复的 "每一个文件的基础上"(感谢@Lee_Dailey,我已经得到了wath是不清楚我的问题!)。
EDIT 4:
$SourcePath = 'F:\DATA\Urls_CP'
$TargetPath = 'F:\DATA\Urls_CP\DeDupe'
$UrlList = Get-ChildItem -Path $SourcePath -Filter *.txt |
ForEach-Object {
$FileName = $_.BaseName
$FileLWT = (Get-ItemProperty $_.FullName).LastWriteTime
Get-Content -Path $_.FullName -Encoding default |
ForEach-Object {
[PSCustomObject]@{
URL = $_
File = $FileName
LWT = $FileLWT
}
}
}
$UrlList |
Sort-Object -Property URL -Unique |
ForEach-Object {
$TargetFile = Join-Path -Path $TargetPath -ChildPath ($_.File + '.txt')
$_.URL | Out-File -FilePath $TargetFile -Append -Encoding default
Set-ItemProperty $TargetFile -Name LastWriteTime -Value $_.LWT
}