在从 Firefox 导出的超过 2600 个书签的文件中,我想将它们导入到 Buku 中,这似乎与 html 文件中的图标有关。 所以我想用 ICON 引用代替任何东西 这是一个例子,最短的一个:
ICON="data:image/png;base64,PHN2ZyB4bWxucz0naHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmcnIHdpZHRoPScxNicgaGVpZ2h0PScxNic+IDxwYXRoIGQ9J00wIDBoMTZ2MTZIMHonLz4gPHBhdGggZD0nTTEzLjk5NCAxMC4zNTZIMTVWMTJoLTMuMTcxVjcuNzQxYzAtMS4zMDgtLjQzNS0xLjgxLTEuMjktMS44MS0xLjA0IDAtMS40Ni43MzctMS40NiAxLjh2Mi42M2gxLjAwNlYxMkg2LjkxOFY3Ljc0MWMwLTEuMzA4LS40MzUtMS44MS0xLjI5MS0xLjgxLTEuMDM5IDAtMS40NTkuNzM3LTEuNDU5IDEuOHYyLjYzaDEuNDQxVjEySDF2LTEuNjQ0aDEuMDA2VjYuMDc5SDFWNC40MzVoMy4xNjh2MS4xMzlhMi41MDcgMi41MDcgMCAwIDEgMi4zLTEuMjlBMi40NTIgMi40NTIgMCAwIDEgOC45MzEgNS45MSAyLjUzNSAyLjUzNSAwIDAgMSAxMS40IDQuMjg0IDIuNDQ4IDIuNDQ4IDAgMCAxIDE0IDYuOXYzLjQ1OHonIGZpbGw9JyNmZmYnLz4gPC9zdmc+"
我已经尝试过了
sed -e 's/^ICON=\"data:image\/png;base64,^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/][AQgw]==|[A-Za-z0-9+/]{2}[AEIMQUYcgkosw048]=)?$\"$//g' firefox_bookmarks_copie.html > test1.html
sed -e 's/^ICON="[[:print:]]"$//gi' firefox_bookmarks_copie.html > test2.html
sed -e 's/^ICON="data:image(\/[^;]+;base64[^"]+)"$//g' firefox_bookmarks_copie.html > test3.html
awk '{gsub(/^ICON="[:print:]"$/,"");}' firefox_bookmarks_copie.html > copie4.html
在 copy4.html 中保存时,AWK 似乎给我带来了问题
perl -0pe 's/^ICON="data:image(\/[^;]+;base64[^"]+)"$//' firefox_bookmarks_copie.html >> copie5.html
https://regex101.com/r/sxFswz/1网站似乎告诉我,我的替换 REGEX 对有效
/ICON="data:image(\/[^;]+;base64[^"]+)"/g
你能帮我吗?
ICON="..."
字符串
$ cat bm.html
<!DOCTYPE html>
<html>
<head>
... some other stuff ...
</head>
<body>
... some other stuff ...
<DT><A HREF="https://www.inter.net/search/results/content/?abc" ICON="data:image/png;base64,PHN2ZyB...snip_#1...z4gPC9zdmc+">some description</A>
<DT><A HREF="https://www.inter.net/search/results/content/?abc" ICON="data:image/png;base64,PHN2ZyB...snip_#2...z4gPC9zdmc+">some description</A>
</body>
</html>
一般方法 - 查找字符串 ICON="
+
[^"]*
(不包含双引号的字符串)+
"
一个sed
想法:
$ sed 's/ICON="[^"]*"//g' bm.html
<!DOCTYPE html>
<html>
<head>
... some other stuff ...
</head>
<body>
... some other stuff ...
<DT><A HREF="https://www.inter.net/search/results/content/?abc" >some description</A>
<DT><A HREF="https://www.inter.net/search/results/content/?abc" >some description</A>
</body>
</html>
一个awk
想法:
$ awk '{gsub(/ICON="[^"]+"/,"")}1' bm.html
<!DOCTYPE html>
<html>
<head>
... some other stuff ...
</head>
<body>
... some other stuff ...
<DT><A HREF="https://www.inter.net/search/results/content/?abc" >some description</A>
<DT><A HREF="https://www.inter.net/search/results/content/?abc" >some description</A>
</body>
</html>