如何使用awk命令根据匹配和过滤条件打印多行

问题描述 投票:0回答:1

我正在尝试过滤掉比最新日志早7天的日志,例如,如果日志的最后一部分是2024-02-13,那么2024-02-05上的所有日志都将被删除。

示例日志文件:

* Server Name: png9iwb4a
* Date and Time: 2024-02-05 23:00:01
* Total Number of Child (PID): 5
* PID: 117703, Worker Threads: 46
* PID: 117704, Worker Threads: 30
* PID: 117705, Worker Threads: 30
* PID: 117809, Worker Threads: 30

* Server Name: png9iwb4a
* Date and Time: 2024-02-05 23:30:01
* Total Number of Child (PID): 5
* PID: 117703, Worker Threads: 46
* PID: 117704, Worker Threads: 30
* PID: 117705, Worker Threads: 30
* PID: 117809, Worker Threads: 30

* Server Name: png9iwb4a
* Date and Time: 2024-02-06 00:00:01
* Total Number of Child (PID): 5
* PID: 117703, Worker Threads: 46
* PID: 117704, Worker Threads: 30
* PID: 117705, Worker Threads: 30
* PID: 117809, Worker Threads: 30

.
.
.

* Server Name: png9iwb4a
* Date and Time: 2024-02-13 23:00:01
* Total Number of Child (PID): 5
* PID: 117703, Worker Threads: 46
* PID: 117704, Worker Threads: 30
* PID: 117705, Worker Threads: 30
* PID: 117809, Worker Threads: 30

* Server Name: png9iwb4a
* Date and Time: 2024-02-13 23:30:01
* Total Number of Child (PID): 5
* PID: 117703, Worker Threads: 46
* PID: 117704, Worker Threads: 30
* PID: 117705, Worker Threads: 30
* PID: 117809, Worker Threads: 30

预期输出:

* Server Name: png9iwb4a
* Date and Time: 2024-02-06 00:00:01
* Total Number of Child (PID): 5
* PID: 117703, Worker Threads: 46
* PID: 117704, Worker Threads: 30
* PID: 117705, Worker Threads: 30
* PID: 117809, Worker Threads: 30

.
.
.

* Server Name: png9iwb4a
* Date and Time: 2024-02-13 23:00:01
* Total Number of Child (PID): 5
* PID: 117703, Worker Threads: 46
* PID: 117704, Worker Threads: 30
* PID: 117705, Worker Threads: 30
* PID: 117809, Worker Threads: 30

* Server Name: png9iwb4a
* Date and Time: 2024-02-13 23:30:01
* Total Number of Child (PID): 5
* PID: 117703, Worker Threads: 46
* PID: 117704, Worker Threads: 30
* PID: 117705, Worker Threads: 30
* PID: 117809, Worker Threads: 30

我确实尝试使用 awk 命令,但它没有达到我想要的效果,它只是清除了整个日志文件。

我的代码如下:

#!/bin/bash

LOG_FILE="logs/worker_count.log"
TMP_FILE="$LOG_FILE.tmp"

# Extract the newest and oldest dates from the log file
newest_date=$(grep -oP 'Date and Time: \K[^\n]+' "$LOG_FILE" | tail -n 1)
oldest_date=$(grep -oP 'Date and Time: \K[^\n]+' "$LOG_FILE" | head -n 1)

# Check if either date is empty
if [ -z "$newest_date" ] || [ -z "$oldest_date" ]; then
    echo "Error: Unable to extract dates from the log file."
    exit 1
fi

# Convert dates to timestamps for comparison
newest_timestamp=$(date -d "$newest_date" +"%s")
oldest_timestamp=$(date -d "$oldest_date" +"%s")

# Calculate the difference in seconds
time_difference=$((newest_timestamp - oldest_timestamp))

# If the difference is greater than or equal to 7 days (604800 seconds)
if [ "$time_difference" -ge 604800 ]; then
        # Calculate the cutoff date based on the newest date minus 7 days
        cutoff_date=$(date -d "@$((newest_timestamp - 604800))" +"%Y-%m-%d %T")

        # Extract entries within the specified date range and remove old entries
        awk -v cutoff="$cutoff_date" '/^(\* Server Name:|\* Date and Time:)/ {
        server_name = $NF
        getline datetime
        if (datetime >= cutoff) {
          print "* Server Name: " server_name
          print datetime
          for (i = 1; i <= 5; i++) {
            getline line
            print line
          }
        }
      }
    ' "$LOG_FILE" > "$TMP_FILE"

        # Replace the original log file with the filtered entries
        #mv "$TMP_FILE" "$LOG_FILE"
else
        echo "No need to remove old entries. Time difference is less than 7 days."
fi

有人可以帮我吗?

regex bash shell unix awk
1个回答
0
投票

如果您有

tac
,向后读取文件可能会更高效:

tac "$LOG_FILE" |
awk -v RS= '
    /Date and Time/ {
        if (!cutoff)
            "date --date \""$(NF-4)" "$(NF-5)" - 7 days\" +%F%T" | getline cutoff
        else
            if ($(NF-5)$(NF-4) < cutoff) exit
    }
    { print ORS $0 }   
' |
tac
© www.soinside.com 2019 - 2024. All rights reserved.