我有URL的列表,我需要将页面标题保存在另一个列表中。 wget或curl似乎是正确的方式,但我不确切知道如何。你能帮我吗?谢谢
你的意思是那样的?
wget_title_from_file list.是
#!/bin/bash
while read -r URL; do
echo -n "$URL --> "
wget -q -O - "$URL" | \
tr "\n" " " | \
sed 's|.*<title>\([^<]*\).*</head>.*|\1|;s|^\s*||;s|\s*$||'
echo
done
Filelist.txt中
https://stackoverflow.com
https://cnn.com
https://reddit.com
https://archive.org
用法
./wget_title_from_filelist.sh < filelist.txt
产量
https://stackoverflow.com --> Stack Overflow - Where Developers Learn, Share, & Build Careers
https://cnn.com --> CNN International - Breaking News, US News, World News and Video
https://reddit.com --> reddit: the front page of the internet
https://archive.org --> Internet Archive: Digital Library of Free & Borrowable Books, Movies, Music & Wayback Machine
说明
tr "\n" " " # remove \n, create one line of input for sed
sed 's|.*<title>\([^<]*\).*</head>.*|\1|; # find <title> in <head>
s|^\s*||; # remove leading spaces
s|\s*$||' # remove trailing spaces