下面是文本文件的样本。我需要的pipleline之前单词“ID”组从字符串每秒计数(“|”)
2019-02-10 12:00:03.448|Id: 26102338
2019-02-10 12:00:03.448|Id: 25941418
2019-02-10 12:00:03.449|Id: 25827373
2019-02-10 12:00:03.449|Id: 26102038
2019-02-10 12:00:03.449|Id: 25929358
2019-02-10 12:00:04.382 | =====================================Start
fetching=====================================
2019-02-10 12:00:04.451 |
2019-02-10 12:00:04.426|Id: 25713118
2019-02-10 12:00:04.426|Id: 26076208
2019-02-10 12:00:04.426|Id: 26079643
2019-02-10 12:00:04.426|Id: 26085973
2019-02-10 12:00:04.426|Id: 26090023
2019-02-10 12:00:04.426|Id: 26130133
2019-02-10 12:00:04.426|Id: 25954018
2019-02-10 12:00:04.427|Id: 25951468
2019-02-10 12:00:04.427|Id: 26136148
2019-02-10 12:00:04.427|Id: 26103013
2019-02-10 12:00:04.427|Id: 25806433
我需要这样的输出:
Time |Count(Id)
2019-02-10 12:00:03|5
2019-02-10 12:00:04|11
任何人都可以帮忙吗?
如果每行总是有结束的Id
,你不介意的格式是倒过来,这非常简单:
grep 'Id:' /tmp/data.txt | cut -f 1 -d '.' | uniq -c
5 2019-02-10 12:00:03
11 2019-02-10 12:00:04
grep
扔掉空行。cut
采场点之前(即没有毫秒的时间)。uniq
计数数目似乎每次。(如果该文件并不总是为了,你可能还需要sort
之前有一个uniq
)。
为了扭转数据,并添加一个管道,以符合您要求的格式,你可以通过管道sed的输出 - 是这样的:
sed -re 's/ +([0-9]+) (.+)/\2|\1/'
data.txt中
2019-02-10 12:00:03.448|Id: 26102338
2019-02-10 12:00:03.448|Id: 25941418
2019-02-10 12:00:03.449|Id: 25827373
2019-02-10 12:00:03.449|Id: 26102038
2019-02-10 12:00:03.449|Id: 25929358
2019-02-10 12:00:04.426|Id: 25713118
2019-02-10 12:00:04.426|Id: 26076208
2019-02-10 12:00:04.426|Id: 26079643
2019-02-10 12:00:04.426|Id: 26085973
2019-02-10 12:00:04.426|Id: 26090023
2019-02-10 12:00:04.426|Id: 26130133
2019-02-10 12:00:04.426|Id: 25954018
2019-02-10 12:00:04.427|Id: 25951468
2019-02-10 12:00:04.427|Id: 26136148
2019-02-10 12:00:04.427|Id: 26103013
2019-02-10 12:00:04.427|Id: 25806433
2019-02-10 12:00:03.427|Id: 25806433
命令:
grep 'Id:' data.txt | cut -f 1 -d '.' | sort | uniq -c | awk '{print $2" "$3" | "$1}'
排序前计数,以避免无序时间戳
输出:
2019-02-10 12:00:03 | 6
2019-02-10 12:00:04 | 11