在进行过程中处理curl_multi_exec结果?

问题描述 投票:0回答:1

我正在使用内置的 PHP cURL multi 构建一个简单的网络蜘蛛。效果很好。这是基本实现:

我正在使用内置的 PHP cURL multi 构建一个简单的网络蜘蛛。效果很好。这是基本实现:

<?php
$remainingTargets = ...;
$concurrency = 30;

$multiHandle = curl_multi_init();
$targets = [];
while (count($targets) < $concurrency && count($remainingTargets) > 0) {
  $target = array_shift($remainingTargets);
  $alreadyChecked = ...;
  if ($alreadyChecked !== false) {
    continue;
  }
  $curl = curl_init($target);
  curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36');
  curl_setopt($curl, CURLOPT_FAILONERROR, true);
  curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
  curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 4);
  curl_setopt($curl, CURLOPT_TIMEOUT, 5);
  curl_multi_add_handle($multiHandle, $curl);
  $targets[$target] = $curl;
}

// Run loop for downloading
$running = null;
do {
  curl_multi_exec($multiHandle, $running);
} while ($running);

// Harvest results
foreach ($targets as $target => $curl) {
  $html = curl_multi_getcontent($curl);
  curl_multi_remove_handle($multiHandle, $curl);
  // Process this page
}
curl_multi_close($multiHandle);

// If done show results, or continue processing queue...

但是我想知道,是否可以在这里的“运行循环”中进行收获?我想这会更快地释放资源并运行得更好。看来我想要一个c风格的选择。但

curl_multi_select
不返回特定资源。

php curl curl-multi
1个回答
0
投票

我知道这已经很旧了,但回答是因为我有同样的问题:

解决方案似乎是使用 curl_multi_info_read 它将返回一个包含已完成传输的数组。

$mh = curl_multi_init();

// Add CurlHandles to CurlMultiHandle
foreach ([
    'https://example.com',
    'https://example.net',
    'https://example.org',
] as $url) {
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_multi_add_handle($mh, $ch);
}

do {
    // Run sub-connections
    curl_multi_exec($mh, $running);

    // Wait for activity on CurlMultiHandle
    curl_multi_select($mh);

    // Consume any completed transfers
    while ($curlMultiInfoRead = curl_multi_info_read($mh)) {
        // Check CurlHandle has not had an error
        if ($curlMultiInfoRead['result'] !== CURLE_OK) {
            throw new \RuntimeException(curl_error($curlMultiInfoRead['handle']));
        }

        // Get information on the request
        $curlGetInfo = curl_getinfo($curlMultiInfoRead['handle']);
        echo $curlGetInfo['http_code'].'<br>';
        echo $curlGetInfo['url'].'<br>';

        // Get contents of the request etc.
        $curlMultiGetContent = curl_multi_getcontent($curlMultiInfoRead['handle']);
        echo htmlentities(substr($curlMultiGetContent, 0, 100)).'<br>';

        // Close this CurlHandles and remove it from CurlMultiHandle
        curl_close($curlMultiInfoRead['handle']);
        curl_multi_remove_handle($mh, $curlMultiInfoRead['handle']);
    }
} while ($running > 0);

CURLMOPT_MAX_TOTAL_CONNECTIONS 结合使用时特别有用,它将限制活动连接总数,并使用 Generator 在发生每个卷曲响应时生成它。

© www.soinside.com 2019 - 2024. All rights reserved.