我一直在尝试从图像网站(pixiv)获取img url来获得相当大的结果 (输入链接是艺术品类型的。例如:
将与此 php 一起使用) 虽然通过模式匹配检索相关链接没有问题, 似乎链接即使格式正确也会抛出 403,因为该站点被配置为阻止外部访问(可能是为了保留带宽)。
我确实偶然发现了一个选项来传递有效的“请求标头”以使事情正常工作: https://www.reddit.com/r/Rlanguage/comments/ytgtun/im_trying_to_use_downloadfile_but_i_get_a_403/?rdt=55917
但是到目前为止,这似乎不起作用(原始示例是在“R”中,我正在使用 PHP 来尝试复制该行为。)
我的代码到目前为止看起来像这样(主要焦点是在 php 方面,其余的只是 JS 来简化我应该让它工作的事情:
<!DOCTYPE html>
<html>
<head>
<title>Image Retrieval</title>
</head>
<body>
<form method="post" action="<?php echo $_SERVER['PHP_SELF']; ?>">
<label for="url">Enter the URL:</label>
<input type="text" id="url" name="url">
<button type="submit">Submit</button>
</form>
<?php
if ($_SERVER["REQUEST_METHOD"] == "POST") {
$url = $_POST["url"];
$options = [
'http' => [
'header' => "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0\r\n" .
"Referer: https://accounts.pixiv.net\r\n",
],
];
$context = stream_context_create($options);
$html = file_get_contents($url, false, $context);
$pattern = '/image" href="(.*?)"/'; //find downscaled master img (always in jpg format)
//$pattern = '/"original":"(.*?)"/'; //find original image (usually only works when logged in)
preg_match($pattern, $html, $matches);
$imageUrl = $matches[1];
echo '<p>Image Link: <a id="image-link" href="' . $imageUrl . '">' . $imageUrl . '</a></p>';
}
?>
<script>
var imageLink = document.getElementById("image-link");
if (imageLink) {
window.location.href = imageLink.href;
}
</script>
<-!Autofill if querystring exists-->
<script>
var urlParams = new URLSearchParams(window.location.search);
var pixivUrl = urlParams.get('pixivurl');
if (pixivUrl) {
var urlInput = document.getElementById('url');
if (urlInput) {
urlInput.value = pixivUrl;
}
var form = document.querySelector('form');
if (form) {
form.submit();
}
}
</script>
</body>
</html>
我相当确定需要一些特定的东西才能正确传递请求标头,但我从未使用过该功能,所以我有点不知所措。
提前致谢
您似乎正在尝试从 Pixiv 检索图像 URL 并将其显示在您的网页上。但是,您遇到 403 Forbidden 错误,可能是由于 Pixiv 对外部访问的限制。
要绕过此问题,您可以尝试在 HTTP 请求中设置其他标头,以更接近地模拟浏览器请求。
<!DOCTYPE html>
<html>
<head>
<title>Image Retrieval</title>
</head>
<body>
<form method="post" action="<?php echo $_SERVER['PHP_SELF']; ?>">
<label for="url">Enter the URL:</label>
<input type="text" id="url" name="url">
<button type="submit">Submit</button>
</form>
<?php
if ($_SERVER["REQUEST_METHOD"] == "POST") {
$url = $_POST["url"];
$options = [
'http' => [
'header' => "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0\r\n" .
"Referer: https://www.pixiv.net/\r\n" .
"Accept-Language: en-US,en;q=0.5\r\n" .
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n",
],
];
$context = stream_context_create($options);
$html = file_get_contents($url, false, $context);
$pattern = '/image" href="(.*?)"/'; //find downscaled master img (always in jpg format)
preg_match($pattern, $html, $matches);
$imageUrl = $matches[1];
echo '<p>Image Link: <a id="image-link" href="' . $imageUrl . '">' . $imageUrl . '</a></p>';
}
?>
<script>
var imageLink = document.getElementById("image-link");
if (imageLink) {
window.location.href = imageLink.href;
}
</script>
<!-- Autofill if querystring exists -->
<script>
var urlParams = new URLSearchParams(window.location.search);
var pixivUrl = urlParams.get('pixivurl');
if (pixivUrl) {
var urlInput = document.getElementById('url');
if (urlInput) {
urlInput.value = pixivUrl;
}
var form = document.querySelector('form');
if (form) {
form.submit();
}
}
</script>
</body>
</html>