我有一个包含 IP 地址和时间戳的文本文件。如果重复 IP 之间的时间戳差异大于 10 分钟,我想将重复 IP 导出到文本文件。 例如:类似这样的事情
1.1.1.1 2024-03-15T13:01:13
2.2.2.2 2024-03-15T17:02:11
1.1.1.1 2024-03-15T13:15:25
在文本文件中,我想导出该 IP (1.1.1.1),因为时间戳差异超过 10 分钟。
我尝试了以下操作,但输出为空。 我犯错误的任何帮助/指导
`# Function to extract IP addresses and their timestamps from a text file
function Get-IPsWithTimestamps {
param (
[string]$FilePath
)
# Read the content of the file
$fileContent = Get-Content -Path $FilePath -Raw
# Use regex to find IP addresses and timestamps in yy-MM-DD HH-MM-SS format
$matches = [regex]::Matches($fileContent, "(\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b)\s+(\d{2}-\d{2}-\d{2}\s+\d{2}-\d{2}-\d{2})")
# Create a hashtable to store IP addresses and their timestamps
$ipTimestamps = @{}
# Iterate through matches and update timestamps for each IP
foreach ($match in $matches) {
$ip = $match.Groups[1].Value
$timestamp = [datetime]::ParseExact($match.Groups[2].Value, "yy-MM-dd HH-mm-ss", $null)
$ipTimestamps[$ip] += , $timestamp
}
return $ipTimestamps
}
**# Function to filter duplicate IPs with occurrence time difference greater than 10 minutes**
function Get-DuplicateIPsWithTimeDifference {
param (
[hashtable]$IPsWithTimestamps
)
$duplicateIPsWithTimeDifference = @()
# Iterate through hashtable and filter duplicate IP addresses
foreach ($ip in $IPsWithTimestamps.Keys) {
$timestamps = $IPsWithTimestamps[$ip]
if ($timestamps.Count -ge 2) {
$firstTimestamp = $timestamps[0]
$lastTimestamp = $timestamps[-1]
$timeDifference = $lastTimestamp - $firstTimestamp
if ($timeDifference.TotalMinutes -gt 10) {
$duplicateIPsWithTimeDifference += $ip
}
}
}
return $duplicateIPsWithTimeDifference
}
# Example usage
$filePath = "C:\path\to\your\textfile.txt"
$ipTimestamps = Get-IPsWithTimestamps -FilePath $filePath
$duplicateIPsWithTimeDifference = Get-DuplicateIPsWithTimeDifference -IPsWithTimestamps $ipTimestamps
# Export unique duplicate IP addresses with time difference greater than 5 minutes to a text file
$duplicateIPsWithTimeDifference | Out-File -FilePath "C:\path\to\output\duplicate_ips_with_time_difference.txt"`
您的代码看起来不错,只是正则表达式模式不正确,特别是
DateTime
捕获组:
(\d{2}-\d{2}-\d{2}\s+\d{2}-\d{2}-\d{2})
应该是:
(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2})
并且,如果您想简化整个模式,您可能可以使用:
((?:\d{1,3}\.){3}\d{1,3})\s+(\d{4}-\d{2}-\d{2}T(?:\d{2}:){2}\d{2})
详情请参阅https://regex101.com/r/7jJy8X/1。
但是,如果文件始终由 IP 地址、空格、时间戳、新行重复组成,则使用起来更简单
-split
:
$ip, $timestamp = '1.1.1.1 2024-03-15T13:01:13' -split '\s+'
$ip # 1.1.1.1
$timestamp # 2024-03-15T13:01:13
至于如何处理代码,我认为您可以通过使用
Group-Object
: 使其变得更简单
# sample data, actually comes from `Get-Content -Raw`
$fileContent = @'
1.1.1.1 2024-03-15T13:01:13
2.2.2.2 2024-03-15T17:02:11
1.1.1.1 2024-03-15T13:15:25
'@
# group the objects by IP and filter out all groups with a single object
$pattern = '((?:\d{1,3}\.){3}\d{1,3})\s+(\d{4}-\d{2}-\d{2}T(?:\d{2}:){2}\d{2})'
$groups = [regex]::Matches($fileContent, $pattern) | ForEach-Object {
[pscustomobject]@{
IpAddress = $_.Groups[1].Value
Timestamp = $_.Groups[2].Value -as [datetime]
}
} | Group-Object IpAddress | Where-Object Count -GT 1
# for each group, first sort the timestamps (if the file is actually sorted then this step could be removed)
# then get a timespan of the last and first items in the group and check if
# the time difference is greater than 10 minutes, if so, output the ip address
$limitspan = [timespan]::FromMinutes(10)
$groups | ForEach-Object {
$sorted = $_.Group.Timestamp | Sort-Object
# if the file is sorted, here you can use:
# `$_.Group[-1].Timestamp - $_.Group[0].TimeStamp -gt $limitspan`
if ($sorted[-1] - $sorted[0] -gt $limitspan) {
$_.Name
}
} | Out-File -FilePath 'C:\path\to\output\duplicate_ips_with_time_difference.txt'