我有一个以制表符分隔的条目文件,其中包含开始和结束位置
# Name \t Start \t End
Name1 \t 1 \t 3
Name2 \t 7 \t 9
Name3 \t 5 \t 8
Name4 \t 5 \t 6
我想删除与之前的行重叠的行。在这个例子中,所需的输出是
Name1 \t 1 \t 3
Name2 \t 7 \t 9
Name4 \t 5 \t 6
到目前为止我所拥有的:
#!/bin/bash
while IFS=$'\n' read line; do
# Assign variable names
name=$(echo $line | cut -f 1)
start=$(echo $line | cut -f 2)
end=$(echo $line | cut -f 3)
# I envision an if statement structured so that:
# if [ $end < $PreviousStart ] || [ $start > $PreviousEnd ] ; then echo $line >> output.txt
done < file.txt
这就是我陷入困境的地方,因为我需要检查output.txt的每一行(原始文件中的所有先前行),并且仅当output.txt的所有当前行都满足if语句时才打印$line。 我在想 awk 可能有一个不那么迂回的解决方案......
非常感谢任何帮助
假设:
一个
awk
想法:
awk '
BEGIN { FS=OFS="\t" }
{ for (i=1; i<=cnt; i++) # loop through array of previous lines
if ( ( $2 >= start[i] && $2 <= end[i] ) || # does current "start" overlap with a previous line?
( $3 >= start[i] && $3 <= end[i] ) ) # does current "end" overlap with a previous line?
next # if there is an overlap then skip this line and process the next line of input
start[++cnt] = $2 # we have a new non-overlapping line so save the start and end points
end[cnt] = $3
print # print current line to stdout
}
' file.txt
这会生成:
Name1 1 3
Name2 7 9
Name4 5 6