Powershell 脚本:我想从文本文件中导出重复的 IP

问题描述 投票:0回答:1

我有一个包含 IP 地址和时间戳的文本文件。如果重复 IP 之间的时间戳差异大于 10 分钟,我想将重复 IP 导出到文本文件。 例如:类似这样的事情

1.1.1.1 2024-03-15T13:01:13
2.2.2.2 2024-03-15T17:02:11
1.1.1.1 2024-03-15T13:15:25

在文本文件中,我想导出该 IP (1.1.1.1),因为时间戳差异超过 10 分钟。

我尝试了以下操作,但输出为空。 我犯错误的任何帮助/指导

`# Function to extract IP addresses and their timestamps from a text file
function Get-IPsWithTimestamps {
    param (
        [string]$FilePath
    )

    # Read the content of the file
    $fileContent = Get-Content -Path $FilePath -Raw

    # Use regex to find IP addresses and timestamps in yy-MM-DD HH-MM-SS format
    $matches = [regex]::Matches($fileContent, "(\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b)\s+(\d{2}-\d{2}-\d{2}\s+\d{2}-\d{2}-\d{2})")

    # Create a hashtable to store IP addresses and their timestamps
    $ipTimestamps = @{}

    # Iterate through matches and update timestamps for each IP
    foreach ($match in $matches) {
        $ip = $match.Groups[1].Value
        $timestamp = [datetime]::ParseExact($match.Groups[2].Value, "yy-MM-dd HH-mm-ss", $null)
        $ipTimestamps[$ip] += , $timestamp
    }

    return $ipTimestamps
}

**# Function to filter duplicate IPs with occurrence time difference greater than 10 minutes**
function Get-DuplicateIPsWithTimeDifference {
    param (
        [hashtable]$IPsWithTimestamps
    )

    $duplicateIPsWithTimeDifference = @()

    # Iterate through hashtable and filter duplicate IP addresses
    foreach ($ip in $IPsWithTimestamps.Keys) {
        $timestamps = $IPsWithTimestamps[$ip]
        if ($timestamps.Count -ge 2) {
            $firstTimestamp = $timestamps[0]
            $lastTimestamp = $timestamps[-1]
            $timeDifference = $lastTimestamp - $firstTimestamp
            if ($timeDifference.TotalMinutes -gt 10) {
                $duplicateIPsWithTimeDifference += $ip
            }
        }
    }

    return $duplicateIPsWithTimeDifference
}

# Example usage
$filePath = "C:\path\to\your\textfile.txt"
$ipTimestamps = Get-IPsWithTimestamps -FilePath $filePath
$duplicateIPsWithTimeDifference = Get-DuplicateIPsWithTimeDifference -IPsWithTimestamps $ipTimestamps

# Export unique duplicate IP addresses with time difference greater than 5 minutes to a text file
$duplicateIPsWithTimeDifference | Out-File -FilePath "C:\path\to\output\duplicate_ips_with_time_difference.txt"`


powershell shell
1个回答
0
投票

您的代码看起来不错,只是正则表达式模式不正确,特别是

DateTime
捕获组:

(\d{2}-\d{2}-\d{2}\s+\d{2}-\d{2}-\d{2})

应该是:

(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2})

并且,如果您想简化整个模式,您可能可以使用:

((?:\d{1,3}\.){3}\d{1,3})\s+(\d{4}-\d{2}-\d{2}T(?:\d{2}:){2}\d{2})

详情请参阅https://regex101.com/r/7jJy8X/1

但是,如果文件始终由 IP 地址、空格、时间戳、新行重复组成,则使用起来更简单

-split
:

$ip, $timestamp = '1.1.1.1 2024-03-15T13:01:13' -split '\s+'
$ip        # 1.1.1.1
$timestamp # 2024-03-15T13:01:13

至于如何处理代码,我认为您可以通过使用

Group-Object
:

使其变得更简单
# sample data, actually comes from `Get-Content -Raw`
$fileContent = @'
1.1.1.1 2024-03-15T13:01:13
2.2.2.2 2024-03-15T17:02:11
1.1.1.1 2024-03-15T13:15:25
'@

# group the objects by IP and filter out all groups with a single object
$pattern = '((?:\d{1,3}\.){3}\d{1,3})\s+(\d{4}-\d{2}-\d{2}T(?:\d{2}:){2}\d{2})'
$groups = [regex]::Matches($fileContent, $pattern) | ForEach-Object {
    [pscustomobject]@{
        IpAddress = $_.Groups[1].Value
        Timestamp = $_.Groups[2].Value -as [datetime]
    }
} | Group-Object IpAddress | Where-Object Count -GT 1

# for each group, first sort the timestamps (if the file is actually sorted then this step could be removed)
# then get a timespan of the last and first items in the group and check if
# the time difference is greater than 10 minutes, if so, output the ip address
$limitspan = [timespan]::FromMinutes(10)
$groups | ForEach-Object {
    $sorted = $_.Group.Timestamp | Sort-Object

    # if the file is sorted, here you can use:
    # `$_.Group[-1].Timestamp - $_.Group[0].TimeStamp -gt $limitspan`
    if ($sorted[-1] - $sorted[0] -gt $limitspan) {
        $_.Name
    }
} | Out-File -FilePath 'C:\path\to\output\duplicate_ips_with_time_difference.txt'
© www.soinside.com 2019 - 2024. All rights reserved.