如何在 bash、awk 中解析文件...以更有效的方式链接程序和子程序

Question

我需要为我们实际使用的程序和子程序列表生成树视图。

我们现在有一个工作脚本（在 ksh 中），但是需要很长时间才能实现（3 到 4 小时之间），而且我很确定它可以更有效/更快地完成

实际的工作脚本使用两个文件： 1：我们实际使用的程序列表（2926行） 2：与该程序关联的所有程序和子程序的列表（23922行）

为了更好地理解，这是文件 2 的表示：

A|B C D E
B|F G
C|J K L
E|
K|Z
L|M N
Z|Y X

程序A调用子程序B、C、D、E 程序B（被A调用）调用子程序F和G 程序C（被A调用）调用子程序J K和L 等等...

从这两个文件中，我们需要编写一个 csv 文件，它代表我们在第一个文件中使用的所有程序的树视图。

这是先前表示的预期输出：

A
;B
;;F
;;G
;C
;;J
;;K
;;;Z
;;;;Y
;;;;X
;;L
;;;M
;;;N
;E

为了实现这一目标，这就是我们实际所做的:

调用 csv_tree.sh 的主脚本部分：

#!/bin/ksh
cat program_list | awk -F "|" '{print($2)}' | sort | uniq >program_list.tmp
while read -r line; do
    begin=$(echo "$line" | awk -F "|" '{print $1}')
    sh csv_tree.sh "$begin" 2>/dev/null
done <$HOME/FISCALITE/OSCAR/FICHIER_IMPORT/FIC_PRG_TREE.tmp
#delete temporary file
rm program_list.tmp

csv_tree.sh：

#!/bin/ksh
#set -x
entry_file="T_SS_PRG_N1"
output_csv_file="${1}_tree.csv"
: >"$output_csv_file"
echo $1 >>"$output_csv_file"
count=0
rank=0
chaine=$(grep -E "^$1" $entry_file | awk '{$1=""}1')
#function loop
loop() {
    rank=$((rank + 1))
    for prg in $2; do
        count=$((count + 1))
        if [[ -n $(echo ${prg} | grep ${chaine}) ]]; then
            rank=1
        fi
        #   calculate the number of ";" to print in the csv file
        separator=$(for ((i = 0; i < rank; ++i)); do echo -n ";"; done)
        echo "${separator}${prg}" >>"$output_csv_file"
        if [ $(echo ${prg} | egrep "^(X|Y|Z)") != "" ] && [ $(grep "^${prg}" ${entry_file} | awk -F\| '{print $2}' | tr -d ' ') != "" ]; then
            echo "${separator};$(grep "^${prg}" ${entry_file} | awk -F\| '{print $2}')" >>"$output_csv_file"
        else
            loop "${prg}_$count" "$(grep -E "^$prg" $entry_file | awk '{$1=""}1')"
            rank=$((rank - 1))
        fi
    done
}

#begin
loop "$1" "$chaine"

我很确定我们可以用纯 awk 来实现这一点，但我并不真正喜欢它:/

如果有人有一些建议或其他方法可以做得更好，我很乐意^^

Answer 1

在 TXR Lisp 中：

$ cat file1
A
B
C
E
L
$ cat file2
A|B C D E
B|F G
C|J K L
E|
K|Z
L|M N
Z|Y X
$ txr deps.tl file1 file2
A
;B
;;F
;;G
;C
;;J
;;K
;;;Z
;;;;Y
;;;;X
;;L
;;;M
;;;N
;D
;E
B
;F
;G
C
;J
;K
;;Z
;;;Y
;;;X
;L
;;M
;;N
E
L
;M
;N

代码如下：

(defstruct mod ()
  name
  children)

(defvarl mh (hash))

(defun resolve-names (m)
  (when (and m.children
             (stringp (first m.children)))
    (set m.children
         (append-each ((c m.children))
           (iflet ((cm [mh c]))
             (list cm))))))

(defun print-tree (m : path (prefix ""))
  (put-line `@prefix@{m.name}`)
  (unless (member m path)
    (let ((nprefix `;@prefix`)
          (npath (cons m path)))
      (each ((c m.children))
        (print-tree c npath nprefix)))))

(tree-bind (file1 file2) *args*
  (let ((used (file-get-lines "file1"))
        (deps (file-get-lines "file2")))
    ;; populate object hash with each definition
    (each ((d deps))
      (match `@left|@right` d
        ;; look up or create new node for left side
        (let ((m (or [mh left]
                     (set [mh left] (new mod name left))))
              (ch (tok #/\S+/ right)))
          ;; ensure node exists for each child
          (each ((c ch))
            (or [mh c] (set [mh c] (new mod name c))))
          ;; append children to left hand node
          (set m.children (append m.children ch)))))
    ;; convert child names to objects
    (dohash (n m mh)
      (resolve-names m))
    ;; print tree view for each used object
    (each ((u used))
      (iflet ((m [mh u]))
        (print-tree m)))))

有针对循环引用的检查

print-tree

，但不针对重复项（菱形引用）。

Answer 2

假设：

我们只为根程序生成树（根程序==不被其他程序调用的程序）
OP有多个root程序
OP 提到了两个文件 -
```
File1
```
和
```
File2
```
- 但在实际代码中使用了其他名称
我猜
```
File2
```
==
```
program_list
```
我们没有看到
```
File1
```
的内容，所以我继续操作，就好像
```
File1
```
不存在一样； OP 可能需要根据
```
File1
```

我们将复制 OP 的示例数据以包含 2 个根程序（

和

A1

）：

$ cat program_list
A|B C D E
B|F G
C|J K L
E|
K|Z
L|M N
Z|Y X
A1|B1 C1 D1 E1
B1|F1 G1
C1|J1 K1 L1
E1|
K1|Z1
L1|M1 N1
Z1|Y1 X1

虽然其他语言将会有更好的性能，但这里有一种使用

bash

的方法，它应该比 OP 的当前代码具有更好的性能（如果只是因为它不调用任何子 shell）：

$ cat build_tree.bash
#!/bin/bash

unset   pc c p r                                 # just in case script is sourced
declare -A c p r                                 # associative arrays for p(arent), c(hild) and r(oot) nodes

#######
# parse data into p[], c[] and parent_child ( pc_<parent>[] ) arrays

while read -r line
do
    IFS='[ |]' read -ra a <<< "${line}"           # a[0]==parent, a[1-n]==children

    parent="${a[0]}"

    p[$parent]=1
    declare -n pc=pc_"${parent}"                 # nameref to create pc_<parent>=( children ) arrays, eg: pc_A=( B C D E )
    pc=()

    for ((i=1 ; i<${#a[@]} ; i++))               # loop through child nodes
    do
        child="${a[$i]}"
        c[$child]=1                              # add as index in c[] array; c[] used in next loop to determine which nodes are roots
        pc+=( "$child" )                         # add child to pc == pc_<parent>[] array
    done
done < program_list                              # process File2

#######
# determine root nodes

for node in "${!p[@]}"                           # loop through p[] array indices
do
    [[ -z "${c[$node]}" ]] && r[$node]=1         # if not an index in the c[] array then add as index in the r[] array
done

unset child                                      # no longer needed so go ahead and release memory

#######
# recursive function: print a node and then process it's children (ie, contents of pc_<node>[] array)

recurse() {
    local node pfx pc child

    node="$1"                                    # eg: A, B, D1
    pfx="$2"                                     # eg: ';', ';;', ';;;'

    declare -n pc=pc_"${node}"                   # nameref for pc_<parent>[] array

    for child in "${pc[@]}"                      # loop through child nodes
    do
        echo "${pfx}${child}"                    # print child node

        # if child is also a parent then make recursive call

        [[ -n "${p[$child]}" ]] && recurse "${child}" "${pfx}${sep}"
    done
}

#######
# main step: loop through root nodes and make top-level recurse() calls

sep=';'

for root in "${!r[@]}"
do
    echo "########### new tree:"
    echo "${root}"
    recurse "${root}" "${sep}"
done

#######
# release memory ... just in case script is sourced

for parent in "${!p[@]}"
do
    declare -n pc=pc_"${parent}"                 # nameref for pc_<parent>[] array

    unset pc
done

unset c p r

注意事项：

需要
```
bash 4.2+
```
才能获得 nameref (
```
declare -n
```
) 支持

试驾：

$ ./build_tree.bash
########### new tree:
A
;B
;;F
;;G
;C
;;J
;;K
;;;Z
;;;;Y
;;;;X
;;L
;;;M
;;;N
;D
;E
########### new tree:
A1
;B1
;;F1
;;G1
;C1
;;J1
;;K1
;;;Z1
;;;;Y1
;;;;X1
;;L1
;;;M1
;;;N1
;D1
;E1

如何在 bash、awk 中解析文件...以更有效的方式链接程序和子程序

问题描述投票：0回答：2

2个回答

最新问题

如何在 bash、awk 中解析文件...以更有效的方式链接程序和子程序

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2