对不起,但我只会说一点英语。
我使用这个:
<?php
function file_get_contents_curl ( $url ) {
$ch = curl_init ();
curl_setopt ( $ch, CURLOPT_AUTOREFERER, TRUE );
curl_setopt ( $ch, CURLOPT_HEADER, 0 );
curl_setopt ( $ch, CURLOPT_RETURNTRANSFER, 1 );
curl_setopt ( $ch, CURLOPT_URL, $url );
curl_setopt ( $ch, CURLOPT_FOLLOWLOCATION, TRUE );
curl_setopt ( $ch, CURLOPT_SSL_VERIFYPEER, 0 ); //
curl_setopt ( $ch, CURLOPT_SSL_VERIFYHOST, 0 ); //
curl_setopt ( $ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; rv:71.0) Gecko/20100101 Firefox/71.0' ); // spoof
$data = curl_exec ( $ch );
curl_close ( $ch );
return $data;
}
include ( __DIR__ . '/simplehtmldom_1_9_1/simple_html_dom.php' );
// 1. OK: $url = 'https://www.p***hub.com/model/ashley-porner';
// 2. OK: $url = 'https://www.p***hub.com/model/ashley-diamond-and-diamond-king';
// 3. NOT OK: $url = 'https://www.p***hub.com/model/ambercashh';
// 4. NOT OK: $url = 'https://www.p***hub.com/model/autumn-raine';
$html = file_get_contents_curl ( $url );
$html = str_get_html ( $html );
var_dump ( $html ); // boolean(false) if NOT OK
?>
1-2。 URL可以,但是3-4。网址不正确。没有显示,没有视图。返回为假。
我尝试从600000更改为6000000(〜/ simplehtmldom_1_9_1 / simple_html_dom.php),但是新的值是加载时间更长,而且比我的网站崩溃还多:
// OLD: defined('MAX_FILE_SIZE') || define('MAX_FILE_SIZE', 600000);
defined('MAX_FILE_SIZE') || define('MAX_FILE_SIZE', 6000000); // NEW
怎么了?
谢谢。
作为测试,您可以运行以下命令-显然需要编辑url,但它显示出合理的性能-因此,内存不足的原因必须归因于未包含的代码中
<?php
function file_get_contents_curl ( $url ) {
$ch = curl_init ();
curl_setopt ( $ch, CURLOPT_AUTOREFERER, TRUE );
curl_setopt ( $ch, CURLOPT_HEADER, 0 );
curl_setopt ( $ch, CURLOPT_RETURNTRANSFER, 1 );
curl_setopt ( $ch, CURLOPT_URL, $url );
curl_setopt ( $ch, CURLOPT_FOLLOWLOCATION, TRUE );
curl_setopt ( $ch, CURLOPT_SSL_VERIFYPEER, 0 );
curl_setopt ( $ch, CURLOPT_SSL_VERIFYHOST, 0 );
curl_setopt ( $ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; rv:71.0) Gecko/20100101 Firefox/71.0' ); // spoof
$data = curl_exec ( $ch );
curl_close ( $ch );
return $data;
}
$start=time();
$memstart=memory_get_usage();
$baseurl='https://www.*******.com/model/';
$models=['ashley-porner','ashley-diamond-and-diamond-king','ambercashh','autumn-raine'];
libxml_use_internal_errors( true );
$dom=new DOMDocument;
$dom->validateOnParse=false;
$dom->recover=true;
$dom->strictErrorChecking=false;
/* do some expensive DOM operations to test performance */
$query='//section[ @class="topProfileHeader" ]/div/div/div[ @class="content-columns" ]/div[ @class="infoPiece" ]';
foreach( $models as $model ){
$url = $baseurl . $model;
$res = file_get_contents_curl( $url );
$dom->loadHTML( $res );
$xp=new DOMXPath( $dom );
libxml_clear_errors();
$col=$xp->query( $query );
if( $col->length > 0 ){
foreach( $col as $node ) {
echo str_repeat( '.', strlen( $node->nodeValue ) ) . '<br />';
}
}
}
$memory=memory_get_usage() - $memstart;
printf(
'<div style="padding:1rem; border:1px solid red;">Script took approx: %ss - consumed: %sMb, Peak memory consumption: %sMb</div>',
( time() - $start ),
round( $memory / pow(1024,2), 2 ),
round( memory_get_peak_usage() / pow(1024,2), 2 )
);
?>