将 ICON 参考替换为任何内容

问题描述 投票:0回答:1

在从 Firefox 导出的超过 2600 个书签的文件中,我想将它们导入到 Buku 中,这似乎与 html 文件中的图标有关。 所以我想用 ICON 引用代替任何东西 这是一个例子,最短的一个:

ICON="data:image/png;base64,PHN2ZyB4bWxucz0naHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmcnIHdpZHRoPScxNicgaGVpZ2h0PScxNic+IDxwYXRoIGQ9J00wIDBoMTZ2MTZIMHonLz4gPHBhdGggZD0nTTEzLjk5NCAxMC4zNTZIMTVWMTJoLTMuMTcxVjcuNzQxYzAtMS4zMDgtLjQzNS0xLjgxLTEuMjktMS44MS0xLjA0IDAtMS40Ni43MzctMS40NiAxLjh2Mi42M2gxLjAwNlYxMkg2LjkxOFY3Ljc0MWMwLTEuMzA4LS40MzUtMS44MS0xLjI5MS0xLjgxLTEuMDM5IDAtMS40NTkuNzM3LTEuNDU5IDEuOHYyLjYzaDEuNDQxVjEySDF2LTEuNjQ0aDEuMDA2VjYuMDc5SDFWNC40MzVoMy4xNjh2MS4xMzlhMi41MDcgMi41MDcgMCAwIDEgMi4zLTEuMjlBMi40NTIgMi40NTIgMCAwIDEgOC45MzEgNS45MSAyLjUzNSAyLjUzNSAwIDAgMSAxMS40IDQuMjg0IDIuNDQ4IDIuNDQ4IDAgMCAxIDE0IDYuOXYzLjQ1OHonIGZpbGw9JyNmZmYnLz4gPC9zdmc+"

我已经尝试过了

sed -e 's/^ICON=\"data:image\/png;base64,^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/][AQgw]==|[A-Za-z0-9+/]{2}[AEIMQUYcgkosw048]=)?$\"$//g' firefox_bookmarks_copie.html > test1.html

sed -e 's/^ICON="[[:print:]]"$//gi' firefox_bookmarks_copie.html > test2.html

sed -e 's/^ICON="data:image(\/[^;]+;base64[^"]+)"$//g' firefox_bookmarks_copie.html > test3.html

awk '{gsub(/^ICON="[:print:]"$/,"");}' firefox_bookmarks_copie.html > copie4.html

在 copy4.html 中保存时,AWK 似乎给我带来了问题

perl -0pe 's/^ICON="data:image(\/[^;]+;base64[^"]+)"$//' firefox_bookmarks_copie.html >> copie5.html

https://regex101.com/r/sxFswz/1网站似乎告诉我,我的替换 REGEX 对有效

/ICON="data:image(\/[^;]+;base64[^"]+)"/g
你能帮我吗?

regex linux perl awk sed
1个回答
0
投票
假设:

    OP 希望从 html 文件中删除
  • ALL ICON="..."
     字符串
使用以下(大量)修改的示例 html 文件进行演示:

$ cat bm.html <!DOCTYPE html> <html> <head> ... some other stuff ... </head> <body> ... some other stuff ... <DT><A HREF="https://www.inter.net/search/results/content/?abc" ICON="data:image/png;base64,PHN2ZyB...snip_#1...z4gPC9zdmc+">some description</A> <DT><A HREF="https://www.inter.net/search/results/content/?abc" ICON="data:image/png;base64,PHN2ZyB...snip_#2...z4gPC9zdmc+">some description</A> </body> </html>
一般方法 - 查找字符串 

ICON="

 + 
[^"]*
(不包含双引号的字符串)+ 
"

一个

sed

想法:

$ sed 's/ICON="[^"]*"//g' bm.html <!DOCTYPE html> <html> <head> ... some other stuff ... </head> <body> ... some other stuff ... <DT><A HREF="https://www.inter.net/search/results/content/?abc" >some description</A> <DT><A HREF="https://www.inter.net/search/results/content/?abc" >some description</A> </body> </html>
一个

awk

想法:

$ awk '{gsub(/ICON="[^"]+"/,"")}1' bm.html <!DOCTYPE html> <html> <head> ... some other stuff ... </head> <body> ... some other stuff ... <DT><A HREF="https://www.inter.net/search/results/content/?abc" >some description</A> <DT><A HREF="https://www.inter.net/search/results/content/?abc" >some description</A> </body> </html>
    
© www.soinside.com 2019 - 2024. All rights reserved.