加拿大邮政的 CURL/屏幕抓取递送跟踪详细信息

问题描述 投票:0回答:4

我需要从加拿大邮政网站获取递送跟踪详细信息,该网站不提供 API。

我已经制定了一个 URL,当输入到浏览器中时,该 URL 会正确返回跟踪信息,但我无法使用 CURL 获取运行请求(它返回 500 We're Soon 页面)。


class cURL { 
var $headers; 
var $user_agent; 
var $compression; 
var $cookie_file; 
var $proxy; 
function cURL($cookies=TRUE,$cookie='cookies.txt',$compression='gzip',$proxy='') { 
$this->headers[] = 'Accept: image/gif, image/x-bitmap, image/jpeg, image/pjpeg'; 
$this->headers[] = 'Connection: Keep-Alive'; 
$this->headers[] = 'Content-type: application/x-www-form-urlencoded;charset=UTF-8'; 
$this->user_agent = 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0)'; 
$this->compression=$compression; 
$this->proxy=$proxy; 
$this->cookies=$cookies; 
if ($this->cookies == TRUE) $this->cookie($cookie); 
} 
function cookie($cookie_file) { 
if (file_exists($cookie_file)) { 
$this->cookie_file=$cookie_file; 
} else { 
fopen($cookie_file,'w') or $this->error('The cookie file could not be opened. Make sure this directory has the correct permissions'); 
$this->cookie_file=$cookie_file; 
fclose($this->cookie_file); 
} 
} 
function get($url) { 
$process = curl_init($url); 
curl_setopt($process, CURLOPT_HTTPHEADER, $this->headers); 
curl_setopt($process, CURLOPT_HEADER, 0); 
curl_setopt($process, CURLOPT_USERAGENT, $this->user_agent); 
if ($this->cookies == TRUE) curl_setopt($process, CURLOPT_COOKIEFILE, $this->cookie_file); 
if ($this->cookies == TRUE) curl_setopt($process, CURLOPT_COOKIEJAR, $this->cookie_file); 
curl_setopt($process,CURLOPT_ENCODING , $this->compression); 
curl_setopt($process, CURLOPT_TIMEOUT, 30); 
if ($this->proxy) curl_setopt($cUrl, CURLOPT_PROXY, 'proxy_ip:proxy_port'); 
curl_setopt($process, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($process, CURLOPT_FOLLOWLOCATION, 1); 
$return = curl_exec($process); 
curl_close($process); 
return $return; 
} 
function post($url,$data) { 
$process = curl_init($url); 
curl_setopt($process, CURLOPT_HTTPHEADER, $this->headers); 
curl_setopt($process, CURLOPT_HEADER, 1); 
curl_setopt($process, CURLOPT_USERAGENT, $this->user_agent); 
if ($this->cookies == TRUE) curl_setopt($process, CURLOPT_COOKIEFILE, $this->cookie_file); 
if ($this->cookies == TRUE) curl_setopt($process, CURLOPT_COOKIEJAR, $this->cookie_file); 
curl_setopt($process, CURLOPT_ENCODING , $this->compression); 
curl_setopt($process, CURLOPT_TIMEOUT, 30); 
if ($this->proxy) curl_setopt($process, CURLOPT_PROXY, $this->proxy); 
curl_setopt($process, CURLOPT_POSTFIELDS, $data); 
curl_setopt($process, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($process, CURLOPT_FOLLOWLOCATION, 1); 
curl_setopt($process, CURLOPT_POST, 1); 
$return = curl_exec($process); 
curl_close($process); 
return $return; 
} 
function error($error) { 
echo "cURL Error
$error"; die; } } $cc = new cURL(); $test = $cc->get('http://www.canadapost.ca/cpotools/apps/track/personal/findByTrackNumber?trackingNumber=x0x0x0x0x0x0x0&trackingType=trackPersonal'); echo $test;

[更新] 根据蒂姆的回复删除接受标题行后,我现在得到一个页面,其中包含“您当前正在访问我们的基本站点”。该网站用于低带宽连接、移动设备和替代浏览器。 - 但是,同样,没有跟踪信息。

php curl screen-scraping
4个回答
1
投票

我相信问题出在这一行:

$this->headers[] = 'Accept: image/gif, image/x-bitmap, image/jpeg, image/pjpeg'; 

添加

text/html
,你应该会很好。或者直接删除该标题。


1
投票

我使用 Snoopy 来进行屏幕刮擦。 完全推荐。

更新: 我可以使用史努比获取该内容(但我需要修改一个简单的行:809)

这是我的代码:

<?php
    include('Snoopy.class.php');

    $http = new Snoopy();
    $http->fetch('http://www.canadapost.ca/cpotools/apps/track/personal/findByTrackNumber?trackingNumber=x0x0x0x0x0x0x0&trackingType=trackPersonal');

    echo $http->results;
?>

需要下载Snoopy库并修改809行:

$cookie_headers .= $cookieKey."=".urlencode($cookieVal)."; ";

与:

$cookie_headers .= $cookieKey."=".$cookieVal."; ";

瞧!


1
投票

这条线索有多少年了? Canadapost 当然提供 API。 http://sellonline.canadapost.ca/DevelopersResources/


0
投票

要在不使用 API 的情况下从加拿大邮政网站提取递送跟踪详细信息,您可以使用 Trace Shipments(一款综合跟踪工具)。该平台允许您在一个地方轻松监控来自多个快递公司的包裹,从而简化了包裹跟踪流程。 立即追踪 访问平台。

© www.soinside.com 2019 - 2024. All rights reserved.