如何使用PHP获取网站的favicon?

问题描述 投票:16回答:13

我希望得到,请求网站的PHP图标。我被推荐使用谷歌的favicon服务,但它不起作用。我想自己做点什么,但不知道正则表达式的用法。

我在Google上发现了一个适用于大多数情况的课程,但它的错误率却令人无法接受。你可以看看这里:http://www.controlstyle.com/articles/programming/text/php-favicon/

请问有人请帮我使用正则表达式获取图标吗?

php regex favicon
13个回答
1
投票

PHP Grab Favicon

这是一种使用许多参数从页面URL获取favicon的舒适方式。

这个怎么运作

  1. 检查favicon是否已经存在于本地或者不希望保存,如果是,则返回路径和文件名
  2. 否则加载URL并尝试将favicon位置与正则表达式匹配
  3. 如果我们匹配,那么favicon链接将是绝对的
  4. 如果我们没有favicon,我们会尝试在域根目录中获取一个
  5. 如果仍然没有图标我们随机尝试google,faviconkit和favicongrabber API
  6. 如果要保存favicon,请尝试加载favicon URL
  7. 如果希望下次保存Favicon并返回路径和文件名

所以它结合了两种方式:尝试从页面获取Favicon,如果不起作用,请使用“API”服务来回馈Favicon ;-)

<?php
/*

PHP Grab Favicon
================

> This `PHP Favicon Grabber` use a given url, save a copy (if wished) and return the image path.

How it Works
------------

1. Check if the favicon already exists local or no save is wished, if so return path & filename
2. Else load URL and try to match the favicon location with regex
3. If we have a match the favicon link will be made absolute
4. If we have no favicon we try to get one in domain root
5. If there is still no favicon we randomly try google, faviconkit & favicongrabber API
6. If favicon should be saved try to load the favicon URL
7. If wished save the Favicon for the next time and return the path & filename

How to Use
----------

```PHP
$url = 'example.com';

$grap_favicon = array(
'URL' => $url,   // URL of the Page we like to get the Favicon from
'SAVE'=> true,   // Save Favicon copy local (true) or return only favicon url (false)
'DIR' => './',   // Local Dir the copy of the Favicon should be saved
'TRY' => true,   // Try to get the Favicon frome the page (true) or only use the APIs (false)
'DEV' => null,   // Give all Debug-Messages ('debug') or only make the work (null)
);

echo '<img src="'.grap_favicon($grap_favicon).'">';
```

Todo
----
Optional split the download dir into several sub-dirs (MD5 segment of filename e.g. /af/cd/example.com.png) if there are a lot of favicons.

Infos about Favicon
-------------------
https://github.com/audreyr/favicon-cheat-sheet

###### Copyright 2019 Igor Gaffling

*/ 

$testURLs = array(
  'http://aws.amazon.com',
  'http://www.apple.com',
  'http://www.dribbble.com',
  'http://www.github.com',
  'http://www.intercom.com',
  'http://www.indiehackers.com',
  'http://www.medium.com',
  'http://www.mailchimp.com',
  'http://www.netflix.com',
  'http://www.producthunt.com',
  'http://www.reddit.com',
  'http://www.slack.com',
  'http://www.soundcloud.com',
  'http://www.stackoverflow.com',
  'http://www.techcrunch.com',
  'http://www.trello.com',
  'http://www.vimeo.com',
  'https://www.whatsapp.com/',
  'https://www.gaffling.com/',
);

foreach ($testURLs as $url) {
  $grap_favicon = array(
    'URL' => $url,   // URL of the Page we like to get the Favicon from
    'SAVE'=> true,   // Save Favicon copy local (true) or return only favicon url (false)
    'DIR' => './',   // Local Dir the copy of the Favicon should be saved
    'TRY' => true,   // Try to get the Favicon frome the page (true) or only use the APIs (false)
    'DEV' => null,   // Give all Debug-Messages ('debug') or only make the work (null)
  );
  $favicons[] = grap_favicon($grap_favicon);
}
foreach ($favicons as $favicon) {
  echo '<img title="'.$favicon.'" style="width:32px;padding-right:32px;" src="'.$favicon.'">';
}
echo '<br><br><tt>Runtime: '.round((microtime(true)-$_SERVER["REQUEST_TIME_FLOAT"]),2).' Sec.';

function grap_favicon( $options=array() ) {

  // Ini Vars
  $url       = (isset($options['URL']))?$options['URL']:'gaffling.com';
  $save      = (isset($options['SAVE']))?$options['SAVE']:true;
  $directory = (isset($options['DIR']))?$options['DIR']:'./';
  $trySelf   = (isset($options['TRY']))?$options['TRY']:true;
  $DEBUG     = (isset($options['DEV']))?$options['DEV']:null;

  // URL to lower case
    $url = strtolower($url);

    // Get the Domain from the URL
  $domain = parse_url($url, PHP_URL_HOST);

  // Check Domain
  $domainParts = explode('.', $domain);
  if(count($domainParts) == 3 and $domainParts[0]!='www') {
    // With Subdomain (if not www)
    $domain = $domainParts[0].'.'.
              $domainParts[count($domainParts)-2].'.'.$domainParts[count($domainParts)-1];
  } else if (count($domainParts) >= 2) {
    // Without Subdomain
        $domain = $domainParts[count($domainParts)-2].'.'.$domainParts[count($domainParts)-1];
    } else {
      // Without http(s)
      $domain = $url;
    }

    // FOR DEBUG ONLY
    if($DEBUG=='debug')print('<b style="color:red;">Domain</b> #'.@$domain.'#<br>');

    // Make Path & Filename
    $filePath = preg_replace('#\/\/#', '/', $directory.'/'.$domain.'.png');

    // If Favicon not already exists local
  if ( !file_exists($filePath) or @filesize($filePath)==0 ) {

    // If $trySelf == TRUE ONLY USE APIs
    if ( isset($trySelf) and $trySelf == TRUE ) {  

      // Load Page
      $html = load($url, $DEBUG);

      // Find Favicon with RegEx
      $regExPattern = '/((<link[^>]+rel=.(icon|shortcut icon|alternate icon)[^>]+>))/i';
      if ( @preg_match($regExPattern, $html, $matchTag) ) {
        $regExPattern = '/href=(\'|\")(.*?)\1/i';
        if ( isset($matchTag[1]) and @preg_match($regExPattern, $matchTag[1], $matchUrl)) {
          if ( isset($matchUrl[2]) ) {

            // Build Favicon Link
            $favicon = rel2abs(trim($matchUrl[2]), 'http://'.$domain.'/');

            // FOR DEBUG ONLY
            if($DEBUG=='debug')print('<b style="color:red;">Match</b> #'.@$favicon.'#<br>');

          }
        }
      }

      // If there is no Match: Try if there is a Favicon in the Root of the Domain
        if ( empty($favicon) ) { 
        $favicon = 'http://'.$domain.'/favicon.ico';

        // Try to Load Favicon
        if ( !@getimagesize($favicon) ) {
          unset($favicon);
        }
        }

    } // END If $trySelf == TRUE ONLY USE APIs

    // If nothink works: Get the Favicon from API
    if ( !isset($favicon) or empty($favicon) ) {

      // Select API by Random
      $random = rand(1,3);

      // Faviconkit API
      if ($random == 1 or empty($favicon)) {
        $favicon = 'https://api.faviconkit.com/'.$domain.'/16';
      }

      // Favicongrabber API
      if ($random == 2 or empty($favicon)) {
        $echo = json_decode(load('http://favicongrabber.com/api/grab/'.$domain,FALSE),TRUE);

        // Get Favicon URL from Array out of json data (@ if something went wrong)
        $favicon = @$echo['icons']['0']['src'];

      }

      // Google API (check also md5() later)
      if ($random == 3) {
        $favicon = 'http://www.google.com/s2/favicons?domain='.$domain;
      } 

      // FOR DEBUG ONLY
      if($DEBUG=='debug')print('<b style="color:red;">'.$random.'. API</b> #'.@$favicon.'#<br>');

    } // END If nothink works: Get the Favicon from API

    // Write Favicon local
    $filePath = preg_replace('#\/\/#', '/', $directory.'/'.$domain.'.png');

    // If Favicon should be saved
    if ( isset($save) and $save == TRUE ) {

      //  Load Favicon
      $content = load($favicon, $DEBUG);

      // If Google API don't know and deliver a default Favicon (World)
      if ( isset($random) and $random == 3 and 
           md5($content) == '3ca64f83fdcf25135d87e08af65e68c9' ) {
        $domain = 'default'; // so we don't save a default icon for every domain again

        // FOR DEBUG ONLY
        if($DEBUG=='debug')print('<b style="color:red;">Google</b> #use default icon#<br>');

      }

      // Write 
      $fh = @fopen($filePath, 'wb');
      fwrite($fh, $content);
      fclose($fh);

      // FOR DEBUG ONLY
        if($DEBUG=='debug')print('<b style="color:red;">Write-File</b> #'.@$filePath.'#<br>');

    } else {

      // Don't save Favicon local, only return Favicon URL
      $filePath = $favicon;
    }

    } // END If Favicon not already exists local

    // FOR DEBUG ONLY
    if ($DEBUG=='debug') {

    // Load the Favicon from local file
      if ( !function_exists('file_get_contents') ) {
      $fh = @fopen($filePath, 'r');
      while (!feof($fh)) {
        $content .= fread($fh, 128); // Because filesize() will not work on URLS?
      }
      fclose($fh);
    } else {
      $content = file_get_contents($filePath);
    }
      print('<b style="color:red;">Image</b> <img style="width:32px;" 
             src="data:image/png;base64,'.base64_encode($content).'"><hr size="1">');
  }

  // Return Favicon Url
  return $filePath;

} // END MAIN Function

/* HELPER load use curl or file_get_contents (both with user_agent) and fopen/fread as fallback */
function load($url, $DEBUG) {
  if ( function_exists('curl_version') ) {
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_USERAGENT, 'FaviconBot/1.0 (+http://'.$_SERVER['SERVER_NAME'].'/');
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    $content = curl_exec($ch);
    if ( $DEBUG=='debug' ) { // FOR DEBUG ONLY
      $http_code = curl_getinfo($ch);
      print('<b style="color:red;">cURL</b> #'.$http_code['http_code'].'#<br>');
    }
    curl_close($ch);
    unset($ch);
  } else {
    $context = array ( 'http' => array (
        'user_agent' => 'FaviconBot/1.0 (+http://'.$_SERVER['SERVER_NAME'].'/)'),
    );
    $context = stream_context_create($context);
      if ( !function_exists('file_get_contents') ) {
      $fh = fopen($url, 'r', FALSE, $context);
      $content = '';
      while (!feof($fh)) {
        $content .= fread($fh, 128); // Because filesize() will not work on URLS?
      }
      fclose($fh);
    } else {
      $content = file_get_contents($url, NULL, $context);
    }
  }
  return $content;
}

/* HELPER: Change URL from relative to absolute */
function rel2abs( $rel, $base ) {
    extract( parse_url( $base ) );
    if ( strpos( $rel,"//" ) === 0 ) return $scheme . ':' . $rel;
    if ( parse_url( $rel, PHP_URL_SCHEME ) != '' ) return $rel;
    if ( $rel[0] == '#' or $rel[0] == '?' ) return $base . $rel;
    $path = preg_replace( '#/[^/]*$#', '', $path);
    if ( $rel[0] ==  '/' ) $path = '';
    $abs = $host . $path . "/" . $rel;
    $abs = preg_replace( "/(\/\.?\/)/", "/", $abs);
    $abs = preg_replace( "/\/(?!\.\.)[^\/]+\/\.\.\//", "/", $abs);
    return $scheme . '://' . $abs;
}

资料来源:https://github.com/gaffling/PHP-Grab-Favicon


0
投票

如果您想从特定网站检索图标,您只需从其网站的根目录中获取favicon.ico即可。像这样:

$domain = "www.example.com";
$url = "http://".$domain."/favicon.ico";
$icondata = file_get_contents($url);

... you can now do what you like with the icon data

0
投票

找到这个帖子......我写了一个WordPress插件,其中包含了很多关于检索favicon的变化。由于有很多GPL代码:http://plugins.svn.wordpress.org/wp-favicons/trunk/

它允许您运行服务器,您可以通过xml rpc请求请求图标,以便任何客户端都可以请求图标。它确实有一个插件结构,所以你可以尝试谷歌,getfavicon等...看看这些服务之一是否提供任何东西。如果没有,那么它会进入一个图标提取模式,考虑到所有的http状态(301/302/404),并最好在任何地方找到一个图标。在此之后,它使用图像库函数来检查文件内部是否真的是图像和什么样的图像(有时扩展名是错误的)并且它是可插入的,因此您可以在管道中添加图像转换后的额外功能。

http获取文件围绕我在上面看到的内容做了一些逻辑:http://plugins.svn.wordpress.org/wp-favicons/trunk/includes/server/class-http.php

但它只是管道的一部分。

一旦你深入了解它会变得非常复杂。


0
投票
$url = 'http://thamaraiselvam.strikingly.com/';
$doc = new DOMDocument();
$doc->strictErrorChecking = FALSE;
@$doc->loadHTML(file_get_contents($url));
$xml = simplexml_import_dom($doc);
$arr = $xml->xpath('//link[@rel="shortcut icon"]');
if (!empty($arr[0]['href'])) {
    echo "<img src=".$arr[0]['href'].">";
 }
else 
echo "<img src='".$url."/favicon.ico'>";

0
投票

我改变了一点Vivek second method并添加了this function,它看起来像这样:

<?php
        $website=$_GET['u'];
        $fevicon= getFavicon($website);
        echo '<img src="'.path_to_absolute($fevicon,$website).'"></img>';

            function getFavicon($site)
            {
            $html=file_get_contents($site);
            $dom=new DOMDocument();
            @$dom->loadHTML($html);
            $links=$dom->getElementsByTagName('link');
            $fevicon='';

            for($i=0;$i < $links->length;$i++ )
            {
                $link=$links->item($i);
                if($link->getAttribute('rel')=='icon'||$link->getAttribute('rel')=="Shortcut Icon"||$link->getAttribute('rel')=="shortcut icon")
                {
                    $fevicon=$link->getAttribute('href');
                }
            }
            return  $fevicon;
            }

    // transform to absolute path function... 
    function path_to_absolute($rel, $base)
    {
    /* return if already absolute URL */
    if (parse_url($rel, PHP_URL_SCHEME) != '') return $rel;
    /* queries and anchors */
    if ($rel[0]=='#' || $rel[0]=='?') return $base.$rel;
    /* parse base URL and convert to local variables:
       $scheme, $host, $path */
    extract(parse_url($base));
    /* remove non-directory element from path */
    $path = preg_replace('#/[^/]*$#', '', $path);
    /* destroy path if relative url points to root */
    if ($rel[0] == '/') $path = '';
    /* dirty absolute URL */
    $abs = "$host$path/$rel";
    /* replace '//' or '/./' or '/foo/../' with '/' */
    $re = array('#(/\.?/)#', '#/(?!\.\.)[^/]+/\.\./#');
    for($n=1; $n>0; $abs=preg_replace($re, '/', $abs, -1, $n)) {}
    /* absolute URL is ready! */
    return $scheme.'://'.$abs;
    }

?>

当然你用https://www.domain.tld/favicon/this_script.php?u=http://www.example.com称它仍然无法捕捉所有选项,但现在绝对路径已经解决。希望能帮助到你。


42
投票

使用谷歌提供的S2 service。就这么简单

http://www.google.com/s2/favicons?domain=www.yourdomain.com

刮这个会更容易,试图自己做。


29
投票

又快又脏:

<?php 
$url = 'http://example.com/';
$doc = new DOMDocument();
$doc->strictErrorChecking = FALSE;
$doc->loadHTML(file_get_contents($url));
$xml = simplexml_import_dom($doc);
$arr = $xml->xpath('//link[@rel="shortcut icon"]');
echo $arr[0]['href'];

4
投票

它看起来像http://www.getfavicon.org/?url=domain.comFAQ)可靠地刮擦网站的favicon。我意识到这是第三方服务,但我认为这是Google favicon服务的一个有价值的替代品。


3
投票

根据Wikipedia的说法,网站可以使用两种主要方法来获取浏览器选择的图标。第一个是史蒂夫提到的,在web服务器的根目录中将图标存储为favicon.ico。第二种是通过HTML链接标记引用favicon。

为了涵盖所有这些情况,最好的想法是首先测试favicon.ico文件的存在,如果不存在,则在源代码中搜索<link rel="icon"<link rel="shortcut icon"部分(仅限于HTML头节点) )直到找到favicon。您是否选择使用正则表达式或某些other string search option(更不用说内置的PHP版本)取决于您。最后,this question可能对你有所帮助。


3
投票

我一直在做类似的事情,我用一堆URL检查了这一切,似乎一切正常。 URL不必是基本URL

function getFavicon($url){
    # make the URL simpler
    $elems = parse_url($url);
    $url = $elems['scheme'].'://'.$elems['host'];

    # load site
    $output = file_get_contents($url);

    # look for the shortcut icon inside the loaded page
    $regex_pattern = "/rel=\"shortcut icon\" (?:href=[\'\"]([^\'\"]+)[\'\"])?/";
    preg_match_all($regex_pattern, $output, $matches);

    if(isset($matches[1][0])){
        $favicon = $matches[1][0];

        # check if absolute url or relative path
        $favicon_elems = parse_url($favicon);

        # if relative
        if(!isset($favicon_elems['host'])){
            $favicon = $url . '/' . $favicon;
        }

        return $favicon;
    }

    return false;
}

2
投票

我已经实现了自己的favicon抓取器,我在这里详细介绍了另一个StackOverflow帖子的用法:Get website's favicon with JS

谢谢,让我知道它是否对你有所帮助。此外,非常感谢任何反馈。


2
投票

第一种方法,我们可以从favicon.ico搜索它,如果找到它然后它将显示其他不

<?php
        $userPath=$_POST["url"];
        $path="http://www.".$userPath."/favicon.ico";
        $header=  get_headers($path);
        if(preg_match("|200|", $header[0]))
        {
            echo '<img src="'.$path.'">';
        }
        else
        {
            echo "<span class=error>Not found</span>";
        }
    ?>

在其他方法中,您可以搜索图标并获取该图标文件

    <?php
$website=$_POST["url"];
$fevicon= getFavicon($website);
echo '<img src="http://www.'.$website.'/'.$fevicon.'">';
function getFavicon($site)
{
            $html=file_get_contents("http://www.".$site);
            $dom=new DOMDocument();
            @$dom->loadHTML($html);
            $links=$dom->getElementsByTagName('link');
            $fevicon='';

            for($i=0;$i < $links->length;$i++ )
            {
                $link=$links->item($i);
                if($link->getAttribute('rel')=='icon'||$link->getAttribute('rel')=="Shortcut Icon"||$link->getAttribute('rel')=="shortcut icon")
                {
                    $fevicon=$link->getAttribute('href');
                }
            }
            return  $fevicon;
}
?>

1
投票

看到这个答案:https://stackoverflow.com/a/22771267。这是一个易于使用的PHP类来获取favicon URL并下载它,它还提供了一些关于favicon文件类型或者如何找到favicon的信息(默认URL,<link>标签......):

<?php
require 'FaviconDownloader.class.php';
$favicon = new FaviconDownloader('https://code.google.com/p/chromium/issues/detail?id=236848');

if($favicon->icoExists){
    echo "Favicon found : ".$favicon->icoUrl."\n";

    // Saving favicon to file
    $filename = 'favicon-'.time().'.'.$favicon->icoType;
    file_put_contents($filename, $favicon->icoData);
    echo "Saved to ".$filename."\n\n";
} else {
    echo "No favicon for ".$favicon->url."\n\n";
}

$favicon->debug();
/*
FaviconDownloader Object
(
    [url] => https://code.google.com/p/chromium/issues/detail?id=236848
    [pageUrl] => https://code.google.com/p/chromium/issues/detail?id=236848
    [siteUrl] => https://code.google.com/
    [icoUrl] => https://ssl.gstatic.com/codesite/ph/images/phosting.ico
    [icoType] => ico
    [findMethod] => head absolue_full
    [error] => 
    [icoExists] => 1
    [icoMd5] => a6cd47e00e3acbddd2e8a760dfe64cdc
)
*/
?>
© www.soinside.com 2019 - 2024. All rights reserved.