supportconfig
命令,在创建“更新”时发现该命令非常慢。第一次检查显示 sed
正在使用“100% CPU”几分钟,所以我怀疑有些效率相当低的东西,因为 sed
通常非常高效。
这是我的独立测试用例(该文件是从
supportconfig
创建的临时文件复制的):
# time sed -i -e ' s/\(.*[P|p]ass"\?:\).*/\1 *REMOVED BY SUPPORTCONFIG*/g; s/\(.*[P|p]assword"\?:\).*/\1 *REMOVED BY SUPPORTCONFIG*/g; s/\(.*[P|p]ass[[:space:]]*=\).*/\1 *REMOVED BY SUPPORTCONFIG*/g; s/\(.*[P|p]assword[[:space:]]*=\).*/\1 *REMOVED BY SUPPORTCONFIG*/g; s/\(.*PASS=\).*/\1*REMOVED BY SUPPORTCONFIG*/g; s/\(.*_PASSWORD[[:space:]]*=\).*/\1 *REMOVED BY SUPPORTCONFIG*/g; s!\(<user_password>\).*\(</user_password>\)!\1*REMOVED BY SUPPORTCONFIG*\2!g; s/\(^ProxyUser[[:space:]]*=\).*/\1 *REMOVED BY SUPPORTCONFIG*/g; s/\(^credentials[[:space:]]*=\).*/\1 *REMOVED BY SUPPORTCONFIG*/g; s/\(secret[[:space:]]*=\).*/\1 *REMOVED BY SUPPORTCONFIG*/g; s/\({'\''[s]*password'\''}[[:space:]]*=[[:space:]]*'\''\).*\('\'';\)/\1*REMOVED BY SUPPORTCONFIG*\2/g; s/\(.*password[[:space:]]*=\).*/\1 *REMOVED BY SUPPORTCONFIG*/g; s/\(.*password_in[[:space:]]*=\).*/\1 *REMOVED BY SUPPORTCONFIG*/g; s/\(^echo -n\).*\(> \/sys\/kernel\/config\/target\/.*auth\/password.*\)/\1 *REMOVED BY SUPPORTCONFIG* \2/g' /tmp/uuu
real 6m53.283s
user 6m45.909s
sys 0m0.129s
# wc /tmp/uuu
12234 123937 2711538 /tmp/uuu
所以处理一个 2.7MB 的文本文件花了将近 7 分钟。 唯一特别的可能是文本行相当长。
我怀疑
.*
在匹配时会导致大量回溯,也许程序员只是有点懒,没有提供更好的正则表达式。
当 sed
运行时,我也做了一个 strace
,但这基本上只是显示 brk
系统调用(内存分配)。
那么是什么原因导致性能如此糟糕,有没有办法改善呢?
不幸的是,我无法提供原始输入文件,因为它似乎包含非免费下载的 URL。 但我可以尝试对我保存的文件进行不同的
sed
调用。
sed
正在使用的版本是sed-4.2.2-7.3.1.x86_64
。
将
[P|p]
修复为 [Pp]
,更具可读性的 sed
命令版本将如下所示:
sed -i -e '
s/\(.*[Pp]ass"\?:\).*/\1 *REMOVED BY SUPPORTCONFIG*/g;
s/\(.*[Pp]assword"\?:\).*/\1 *REMOVED BY SUPPORTCONFIG*/g;
s/\(.*[Pp]ass[[:space:]]*=\).*/\1 *REMOVED BY SUPPORTCONFIG*/g;
s/\(.*[Pp]assword[[:space:]]*=\).*/\1 *REMOVED BY SUPPORTCONFIG*/g;
s/\(.*PASS=\).*/\1*REMOVED BY SUPPORTCONFIG*/g;
s/\(.*_PASSWORD[[:space:]]*=\).*/\1 *REMOVED BY SUPPORTCONFIG*/g;
s!\(<user_password>\).*\(</user_password>\)!\1*REMOVED BY SUPPORTCONFIG*\2!g;
s/\(^ProxyUser[[:space:]]*=\).*/\1 *REMOVED BY SUPPORTCONFIG*/g;
s/\(^credentials[[:space:]]*=\).*/\1 *REMOVED BY SUPPORTCONFIG*/g;
s/\(secret[[:space:]]*=\).*/\1 *REMOVED BY SUPPORTCONFIG*/g;
s/\({'\''[s]*password'\''}[[:space:]]*=[[:space:]]*'\''\).*\('\'';\)/\1*REMOVED BY SUPPORTCONFIG*\2/g;
s/\(.*password[[:space:]]*=\).*/\1 *REMOVED BY SUPPORTCONFIG*/g;
s/\(.*password_in[[:space:]]*=\).*/\1 *REMOVED BY SUPPORTCONFIG*/g;
s/\(^echo -n\).*\(> \/sys\/kernel\/config\/target\/.*auth\/password.*\)/\1 *REMOVED BY SUPPORTCONFIG* \2/g'
很明显,目的是从日志文件中删除密码等敏感信息。
注意: 在某些情况下,代码块似乎确实将
*
解析为标记...
进行https://stackoverflow.com/a/77820459/6607497中建议的修改,我得到了这些结果:
### original version of regex
# /tmp/uuu.sh
real 6m39.697s
user 6m39.092s
sys 0m0.032s
### modified version of regex
# /tmp/uuu.sh
real 0m0.237s
user 0m0.226s
sys 0m0.009s
当将原始输入文件缩小到较小的测试用例(约9kB)时,效果仍然存在,但不那么引人注目:
### original regexes
# /tmp/uuu.sh /tmp/eeee
real 0m0.119s
user 0m0.115s
sys 0m0.004s
### improved regexes
# /tmp/uuu.sh /tmp/eeee
real 0m0.007s
user 0m0.007s
sys 0m0.000s
这是经过混淆和修剪的测试输入文件作为 BASE64(由于行很长);通过管道输入到
base64 -d | gzip -d
进行解码:
# gzip < /tmp/eeee | base64
H4sIAH/cp2UAA9VZbVPbuBb+vP4VZ9yZ7m6JnThQyk0J99JAu2x5yRDKbKfTySi2kmiQLY8kA7nL
7m+/R/ILSYAUmnRv6zKNrJfjo+ccPedIetZuf4KOiGOSRPC5/ajnmfMM6pmS9QFL6v+dpCmV4HmJ
SDyWaCpJqNkltTXeKB154ZiGFwpSKaIs1Mo5pUNJ1ZglI1BUXrKQws+9D739/iFLsuv+vpGRSqZo
v4fNVPaHQvZ7u93+bppyFhLNRKL6QbPf677sX29t9jc3fvadQ0EiI1LSVCimhZxARDTxfR8/mDex
RGnCOY0gJeEFGVFlWnsAN3B6O6p8buDAaJIQDsckpkVdVXzqcwPnVCpUHUu7MhznX1DwhijqeJ63
5t3z3Fd5b8fHPWt3S1XBYWuo0H96E6VpPKu44tSL0Xj4w+mIhBOsO8wLR7b6aTgETa+Rl3LjGVSF
bXNsbe9w3/ttF3uhfb2uELzUYUxmBRmnAes0cOs08BsbjWH3kjBOBowzPYH9a00TC33QBBRqdfBf
Gi3u0cHigDoUCqDj5UrYyp5xxC/rkDsuoOOC6T/tuAt1+EjVDA69GRTKynlAF+pQfO5BWzyIg/Ps
H6MGz7uOucj0LUds/3soxIBIZ1tpSUm842zHVClcs6AnKW27LBkKd+ceKnlOcCW/XpJPciH+dr34
6AOff5hzvjRyISVNDy4Q8ThT+vYVEqShtpsvChcuc2Zpu8acLqrDKbJK2224gJqFY1si0hTy+Zkh
SSRk281957ADz7l+PdY6Va16/erqyleZon4o4vrzkX7tgsrQD+SkHPD0RedalPLxS1imdYcd3NJr
OEtwzuY7IwSLygoG1H4spM4xw/HlcBeGnFwaFLonJ4cuMDWw/YeEK+re2qas2dk2oA05G6IxWUz7
GhF/td5YX29uNRBhTa+xptlobnhBw1sPzhpBq9HAv7VG4NZ3tkvF8q/u0BKh7fpcy3ZEVShZaua/
82TA0X+mx2/XC3zuuE/JZ9+DA32ZMVfoQHPU/nQP8m4l/H+dqH6frxwcf/jj4UhwH8RP8ZjvyFtW
7hi9pXiltzpaqRyh8UibzyMkhkO0EhBATFNJx4YaLqlTRn6VMW06VREXBhnjGpCuCSiMTJjUhSKi
YFT3q2FnYwopJxrRjIFEEQZfRc1YHIJFSCiNFAyliKshGoeo2ExYaXxhiRdyRhMNETXxWoEWts+V
kDzK466qBscCB6XiCqeecRgj53mYUZivkySkdm6ZRm2rASZViQnGWGnydGVtr/z7M6RqUIEVSotx
/igaQ29sVNToBsrK1DQcJ4KL0aQaFVKp2bAiKRJKgQjoKYRqZmjVnxJM+svlxBTQShFEhCj1z9P2
Ckm52DfMrRnMT9TXsune+d7t0tEym105tuKHptI7m6oZcy5lzJm92RctNK/FYnPlsr1S9mOZ7kF7
LSK6nVyzxemRCSSpCRrOLUXNbk7HlKcKJiKDmI0kMXwBZNqVDV9NUUQhaMrqQcOSQBDYH4FrXIKy
cFr6un/xBM0aDEq6QNAvmc34yzwfrsYMCYFIChFToUhQrwz3AchAi9gqd8AaGtsMVi3kuCRCxuM1
oxESlJeManDw5gh+J5dk01IQUnqSxQNUWgwLMZwNJJEM1fjFODG9JnGKUIkU44TiXsP/l7/1q+/8
ZNCt5/D+NI/0bhgazi/4exZz5DeWhDyLcEYsMeDLR0wLZzWoLOtbU8aFDYmJZREzTG2Y2fhQOAmx
RY9JstB4hWimFeVDH7ol/qiW0RtDkiq/YoyRqQx9trSbytIUfR9nYWAiOo9IeiwphQklNrbkXXIy
KIxhUS+NAJuFsCvGOdK+RaRHU3QfY5NmI3hlNcGogTFyBvSHyaQ+tzGsl/vkfNt+TmUVmx6xdzfb
9tPuEe79NRyUq7YFXA2qY4BkyEaZzAW+ZQjWIqn2GIDq0B4B2P8wt0uGzicTmz87ccbNOUB+HNYu
FgdVren6Xy6oTCj/daazn1cqHJSjVct/vKAmsyQxmcAqFA4ljdDLGNKYH9WPO52pCgcTVWm5EF6c
7h+dnO/vwZuP6IHd7snpWefk+O3BuxdOigEdM5poYaevt5VMY/DOrdvjRBOKftDw1/31TW/dD7b8
DT+PEE7Pf+njc+Yb97PnMrjw6zl1RcVv3aTeYSHlgtKUcITbNwQt8VMz2vU00RmSzl6xEE3+9lZk
mOWs0u1kNpg0/cAzv5iJeUZBr9BwyWMpZc6lpmADz1N2Ts6nP11mbTxkVLqt2yyr5hbOh5U2z6q5
JgrjWxGGa24uAmtOi0BFI6w1TmIEPTW1qpmzOG9zowaBh3mGCC8o5udXTI/hQ8JZzAwdnTOpkajg
CLNKk3vXoCuZkOYgoDdFojVkaVw9uUHe0wmqhcHUZPZG/5knn4fUqk80NhpaMolZIwCbM+EffDjr
YC96nTJM/MtuzZdeYwuzq/lu01zerxDa7ZwdnO9jszkNw3fM67n7V20O/LtpyawVFtnAONOUHf76
vCQjTHmLY8/JE1wumaQtsGmOk0negnIrq8Kw2sp+AyrqfSsqyqmgr8UFTR7DWauc03Lb9tXz8Tez
19/ft65fZ4f1H8IOXze3jR9hbt86PVnmBqjYAZI8BuAmwCQMnsIuN7DLGVH330s96lniLraSsJ+Q
gbl8uYF33XfQMbdR9jLY3iph6QyDFDhL3bneuXpdXsLaolonKG4El+DUlUgo7jKxZNQqEb+tRYEA
TrO48J06ki/vJJ9gx+UlFLeec9qW5RuYEKWbK81yTZpQ5ikrWGo0uUQ1R5Km4DGzpbqeVO+X0Dna
Ozw43neOT/rd05M/Pra5CAkf42a2FjRf+Q38F9Re/P7h+P0Lx+QyfSugbYqY1eQNrfWguVV3hlWr
0zvpvO8VAh2FiaoqW0YiHaOjFG82O1ogMhFF4wKt/gfLVc0SqSIAAA==
正如怀疑的那样,那些
.*
比赛的表现非常糟糕:
删除这些之后,性能确实有了巨大的提高(如因子 1500):
real 0m0.230s
user 0m0.225s
sys 0m0.005s
作为参考,这是使用的规范命令(我还删除了
-i
以保留输入文件以进行可重复测试):
time sed -e '
s/\([Pp]ass"\?:\).*/\1 *REMOVED BY SUPPORTCONFIG*/g;
s/\([Pp]assword"\?:\).*/\1 *REMOVED BY SUPPORTCONFIG*/g;
s/\([Pp]ass[[:space:]]*=\).*/\1 *REMOVED BY SUPPORTCONFIG*/g;
s/\([Pp]assword[[:space:]]*=\).*/\1 *REMOVED BY SUPPORTCONFIG*/g;
s/\(PASS=\).*/\1*REMOVED BY SUPPORTCONFIG*/g;
s/\(_PASSWORD[[:space:]]*=\).*/\1 *REMOVED BY SUPPORTCONFIG*/g;
s!\(<user_password>\).*\(</user_password>\)!\1*REMOVED BY SUPPORTCONFIG*\2!g;
s/\(^ProxyUser[[:space:]]*=\).*/\1 *REMOVED BY SUPPORTCONFIG*/g;
s/\(^credentials[[:space:]]*=\).*/\1 *REMOVED BY SUPPORTCONFIG*/g;
s/\(secret[[:space:]]*=\).*/\1 *REMOVED BY SUPPORTCONFIG*/g;
s/\({'\''[s]*password'\''}[[:space:]]*=[[:space:]]*'\''\).*\('\'';\)/\1*REMOVED BY SUPPORTCONFIG*\2/g;
s/\(password[[:space:]]*=\).*/\1 *REMOVED BY SUPPORTCONFIG*/g;
s/\(password_in[[:space:]]*=\).*/\1 *REMOVED BY SUPPORTCONFIG*/g;
s/\(^echo -n\).*\(> \/sys\/kernel\/config\/target\/.*auth\/password.*\)/\1 *REMOVED BY SUPPORTCONFIG* \2/g
' "${1:-/tmp/uuu}" >"${2:-/dev/null}"