我需要为我们实际使用的程序和子程序列表生成树视图。
我们现在有一个工作脚本(在 ksh 中),但是需要很长时间才能实现(3 到 4 小时之间),而且我很确定它可以更有效/更快地完成
实际的工作脚本使用两个文件: 1:我们实际使用的程序列表(2926行) 2:与该程序关联的所有程序和子程序的列表(23922行)
为了更好地理解,这是文件 2 的表示:
A|B C D E
B|F G
C|J K L
E|
K|Z
L|M N
Z|Y X
程序A调用子程序B、C、D、E 程序B(被A调用)调用子程序F和G 程序C(被A调用)调用子程序J K和L 等等...
从这两个文件中,我们需要编写一个 csv 文件,它代表我们在第一个文件中使用的所有程序的树视图。
这是先前表示的预期输出:
A
;B
;;F
;;G
;C
;;J
;;K
;;;Z
;;;;Y
;;;;X
;;L
;;;M
;;;N
;E
为了实现这一目标,这就是我们实际所做的:
调用 csv_tree.sh 的主脚本部分:
#!/bin/ksh
cat program_list | awk -F "|" '{print($2)}' | sort | uniq >program_list.tmp
while read -r line; do
begin=$(echo "$line" | awk -F "|" '{print $1}')
sh csv_tree.sh "$begin" 2>/dev/null
done <$HOME/FISCALITE/OSCAR/FICHIER_IMPORT/FIC_PRG_TREE.tmp
#delete temporary file
rm program_list.tmp
csv_tree.sh:
#!/bin/ksh
#set -x
entry_file="T_SS_PRG_N1"
output_csv_file="${1}_tree.csv"
: >"$output_csv_file"
echo $1 >>"$output_csv_file"
count=0
rank=0
chaine=$(grep -E "^$1" $entry_file | awk '{$1=""}1')
#function loop
loop() {
rank=$((rank + 1))
for prg in $2; do
count=$((count + 1))
if [[ -n $(echo ${prg} | grep ${chaine}) ]]; then
rank=1
fi
# calculate the number of ";" to print in the csv file
separator=$(for ((i = 0; i < rank; ++i)); do echo -n ";"; done)
echo "${separator}${prg}" >>"$output_csv_file"
if [ $(echo ${prg} | egrep "^(X|Y|Z)") != "" ] && [ $(grep "^${prg}" ${entry_file} | awk -F\| '{print $2}' | tr -d ' ') != "" ]; then
echo "${separator};$(grep "^${prg}" ${entry_file} | awk -F\| '{print $2}')" >>"$output_csv_file"
else
loop "${prg}_$count" "$(grep -E "^$prg" $entry_file | awk '{$1=""}1')"
rank=$((rank - 1))
fi
done
}
#begin
loop "$1" "$chaine"
我很确定我们可以用纯 awk 来实现这一点,但我并不真正喜欢它:/
如果有人有一些建议或其他方法可以做得更好,我很乐意^^
在 TXR Lisp 中:
$ cat file1
A
B
C
E
L
$ cat file2
A|B C D E
B|F G
C|J K L
E|
K|Z
L|M N
Z|Y X
$ txr deps.tl file1 file2
A
;B
;;F
;;G
;C
;;J
;;K
;;;Z
;;;;Y
;;;;X
;;L
;;;M
;;;N
;D
;E
B
;F
;G
C
;J
;K
;;Z
;;;Y
;;;X
;L
;;M
;;N
E
L
;M
;N
代码如下:
(defstruct mod ()
name
children)
(defvarl mh (hash))
(defun resolve-names (m)
(when (and m.children
(stringp (first m.children)))
(set m.children
(append-each ((c m.children))
(iflet ((cm [mh c]))
(list cm))))))
(defun print-tree (m : path (prefix ""))
(put-line `@prefix@{m.name}`)
(unless (member m path)
(let ((nprefix `;@prefix`)
(npath (cons m path)))
(each ((c m.children))
(print-tree c npath nprefix)))))
(tree-bind (file1 file2) *args*
(let ((used (file-get-lines "file1"))
(deps (file-get-lines "file2")))
;; populate object hash with each definition
(each ((d deps))
(match `@left|@right` d
;; look up or create new node for left side
(let ((m (or [mh left]
(set [mh left] (new mod name left))))
(ch (tok #/\S+/ right)))
;; ensure node exists for each child
(each ((c ch))
(or [mh c] (set [mh c] (new mod name c))))
;; append children to left hand node
(set m.children (append m.children ch)))))
;; convert child names to objects
(dohash (n m mh)
(resolve-names m))
;; print tree view for each used object
(each ((u used))
(iflet ((m [mh u]))
(print-tree m)))))
有针对循环引用的检查
print-tree
,但不针对重复项(菱形引用)。
假设:
File1
和 File2
- 但在实际代码中使用了其他名称File2
==program_list
File1
的内容,所以我继续操作,就好像 File1
不存在一样; OP 可能需要根据 File1
我们将复制 OP 的示例数据以包含 2 个根程序(
A
和 A1
):
$ cat program_list
A|B C D E
B|F G
C|J K L
E|
K|Z
L|M N
Z|Y X
A1|B1 C1 D1 E1
B1|F1 G1
C1|J1 K1 L1
E1|
K1|Z1
L1|M1 N1
Z1|Y1 X1
虽然其他语言将会有更好的性能,但这里有一种使用
bash
的方法,它应该比 OP 的当前代码具有更好的性能(如果只是因为它不调用任何子 shell):
$ cat build_tree.bash
#!/bin/bash
unset pc c p r # just in case script is sourced
declare -A c p r # associative arrays for p(arent), c(hild) and r(oot) nodes
#######
# parse data into p[], c[] and parent_child ( pc_<parent>[] ) arrays
while read -r line
do
IFS='[ |]' read -ra a <<< "${line}" # a[0]==parent, a[1-n]==children
parent="${a[0]}"
p[$parent]=1
declare -n pc=pc_"${parent}" # nameref to create pc_<parent>=( children ) arrays, eg: pc_A=( B C D E )
pc=()
for ((i=1 ; i<${#a[@]} ; i++)) # loop through child nodes
do
child="${a[$i]}"
c[$child]=1 # add as index in c[] array; c[] used in next loop to determine which nodes are roots
pc+=( "$child" ) # add child to pc == pc_<parent>[] array
done
done < program_list # process File2
#######
# determine root nodes
for node in "${!p[@]}" # loop through p[] array indices
do
[[ -z "${c[$node]}" ]] && r[$node]=1 # if not an index in the c[] array then add as index in the r[] array
done
unset child # no longer needed so go ahead and release memory
#######
# recursive function: print a node and then process it's children (ie, contents of pc_<node>[] array)
recurse() {
local node pfx pc child
node="$1" # eg: A, B, D1
pfx="$2" # eg: ';', ';;', ';;;'
declare -n pc=pc_"${node}" # nameref for pc_<parent>[] array
for child in "${pc[@]}" # loop through child nodes
do
echo "${pfx}${child}" # print child node
# if child is also a parent then make recursive call
[[ -n "${p[$child]}" ]] && recurse "${child}" "${pfx}${sep}"
done
}
#######
# main step: loop through root nodes and make top-level recurse() calls
sep=';'
for root in "${!r[@]}"
do
echo "########### new tree:"
echo "${root}"
recurse "${root}" "${sep}"
done
#######
# release memory ... just in case script is sourced
for parent in "${!p[@]}"
do
declare -n pc=pc_"${parent}" # nameref for pc_<parent>[] array
unset pc
done
unset c p r
注意事项:
bash 4.2+
才能获得 nameref (declare -n
) 支持试驾:
$ ./build_tree.bash
########### new tree:
A
;B
;;F
;;G
;C
;;J
;;K
;;;Z
;;;;Y
;;;;X
;;L
;;;M
;;;N
;D
;E
########### new tree:
A1
;B1
;;F1
;;G1
;C1
;;J1
;;K1
;;;Z1
;;;;Y1
;;;;X1
;;L1
;;;M1
;;;N1
;D1
;E1