根据 bash 中的开始/结束值消除重叠条目

Question

我有一个以制表符分隔的条目文件，其中包含开始和结束位置

# Name \t Start \t End
Name1 \t 1 \t 3
Name2 \t 7 \t 9
Name3 \t 5 \t 8
Name4 \t 5 \t 6

我想删除与之前的行重叠的行。在这个例子中，所需的输出是

Name1 \t 1 \t 3
Name2 \t 7 \t 9
Name4 \t 5 \t 6

到目前为止我所拥有的：

#!/bin/bash
while IFS=$'\n' read line; do
     # Assign variable names
     name=$(echo $line | cut -f 1)
     start=$(echo $line | cut -f 2)
     end=$(echo $line | cut -f 3)
     # I envision an if statement structured so that:
     # if [ $end < $PreviousStart ] || [ $start > $PreviousEnd ] ; then echo $line >> output.txt
done < file.txt

这就是我陷入困境的地方，因为我需要检查output.txt的每一行（原始文件中的所有先前行），并且仅当output.txt的所有当前行都满足if语句时才打印$line。我在想 awk 可能有一个不那么迂回的解决方案......

非常感谢任何帮助

Answer 1

假设：

当我们读取新行时，我们需要测试与所有之前的非重叠行的重叠
如果新行不与任何先前的非重叠行重叠，则...
a) 我们将新线保存为非重叠线组的新成员并且
b) 将新行打印到标准输出

一个

awk

想法：

awk '
BEGIN { FS=OFS="\t" }
      { for (i=1; i<=cnt; i++)                            # loop through array of previous lines
            if ( ( $2 >= start[i] && $2 <= end[i] ) ||    # does current "start" overlap with a previous line?
                 ( $3 >= start[i] && $3 <= end[i] )    )  # does current "end" overlap with a previous line?
                 next                                     # if there is an overlap then skip this line and process the next line of input 

        start[++cnt] = $2                                 # we have a new non-overlapping line so save the start and end points
        end[cnt] = $3
        print                                             # print current line to stdout
      }
' file.txt

这会生成：

Name1   1       3
Name2   7       9
Name4   5       6

根据 bash 中的开始/结束值消除重叠条目

问题描述投票：0回答：1

1个回答

最新问题

根据 bash 中的开始/结束值消除重叠条目

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1