使用 cUrl 应用请求标头

问题描述 投票:0回答:1

我一直在尝试从图像网站(pixiv)获取img url来获得相当大的结果 (输入链接是艺术品类型的。例如:

https://www.pixiv.net/en/artworks/116849074

将与此 php 一起使用) 虽然通过模式匹配检索相关链接没有问题, 似乎链接即使格式正确也会抛出 403,因为该站点被配置为阻止外部访问(可能是为了保留带宽)。

我确实偶然发现了一个选项来传递有效的“请求标头”以使事情正常工作: https://www.reddit.com/r/Rlanguage/comments/ytgtun/im_trying_to_use_downloadfile_but_i_get_a_403/?rdt=55917

但是到目前为止,这似乎不起作用(原始示例是在“R”中,我正在使用 PHP 来尝试复制该行为。)

我的代码到目前为止看起来像这样(主要焦点是在 php 方面,其余的只是 JS 来简化我应该让它工作的事情:

<!DOCTYPE html>
<html>
<head>
    <title>Image Retrieval</title>
</head>
<body>
    <form method="post" action="<?php echo $_SERVER['PHP_SELF']; ?>">
        <label for="url">Enter the URL:</label>
        <input type="text" id="url" name="url">
        <button type="submit">Submit</button>
    </form>
    <?php
    if ($_SERVER["REQUEST_METHOD"] == "POST") {
        $url = $_POST["url"];
        $options = [
            'http' => [
                'header' => "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0\r\n" .
                            "Referer: https://accounts.pixiv.net\r\n",
            ],
        ];
        $context = stream_context_create($options);
        $html = file_get_contents($url, false, $context);
        $pattern = '/image" href="(.*?)"/';   //find downscaled master img (always in jpg format)
        //$pattern = '/"original":"(.*?)"/'; //find original image (usually only works when logged in)
        preg_match($pattern, $html, $matches);
        $imageUrl = $matches[1];

        echo '<p>Image Link: <a id="image-link" href="' . $imageUrl . '">' . $imageUrl . '</a></p>';
    }
    ?>

    <script>
        var imageLink = document.getElementById("image-link");
        if (imageLink) {
            window.location.href = imageLink.href;
        }
    </script>

<-!Autofill if querystring exists-->
    <script>
        var urlParams = new URLSearchParams(window.location.search);
        var pixivUrl = urlParams.get('pixivurl');
        if (pixivUrl) {
            var urlInput = document.getElementById('url');
            if (urlInput) {
                urlInput.value = pixivUrl;
            }
            var form = document.querySelector('form');
            if (form) {
                form.submit();
            }
        }
    </script>
</body>
</html>

我相当确定需要一些特定的东西才能正确传递请求标头,但我从未使用过该功能,所以我有点不知所措。

提前致谢

php http-status-code-403 request-headers
1个回答
0
投票

您似乎正在尝试从 Pixiv 检索图像 URL 并将其显示在您的网页上。但是,您遇到 403 Forbidden 错误,可能是由于 Pixiv 对外部访问的限制。

要绕过此问题,您可以尝试在 HTTP 请求中设置其他标头,以更接近地模拟浏览器请求。

<!DOCTYPE html>
<html>
<head>
    <title>Image Retrieval</title>
</head>
<body>
    <form method="post" action="<?php echo $_SERVER['PHP_SELF']; ?>">
        <label for="url">Enter the URL:</label>
        <input type="text" id="url" name="url">
        <button type="submit">Submit</button>
    </form>
    <?php
    if ($_SERVER["REQUEST_METHOD"] == "POST") {
        $url = $_POST["url"];
        $options = [
            'http' => [
                'header' => "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0\r\n" .
                            "Referer: https://www.pixiv.net/\r\n" .
                            "Accept-Language: en-US,en;q=0.5\r\n" .
                            "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n",
            ],
        ];
        $context = stream_context_create($options);
        $html = file_get_contents($url, false, $context);
        $pattern = '/image" href="(.*?)"/';   //find downscaled master img (always in jpg format)
        preg_match($pattern, $html, $matches);
        $imageUrl = $matches[1];

        echo '<p>Image Link: <a id="image-link" href="' . $imageUrl . '">' . $imageUrl . '</a></p>';
    }
    ?>

    <script>
        var imageLink = document.getElementById("image-link");
        if (imageLink) {
            window.location.href = imageLink.href;
        }
    </script>

    <!-- Autofill if querystring exists -->
    <script>
        var urlParams = new URLSearchParams(window.location.search);
        var pixivUrl = urlParams.get('pixivurl');
        if (pixivUrl) {
            var urlInput = document.getElementById('url');
            if (urlInput) {
                urlInput.value = pixivUrl;
            }
            var form = document.querySelector('form');
            if (form) {
                form.submit();
            }
        }
    </script>
</body>
</html>

https://www. Fiverr.com/s/GYrvad

© www.soinside.com 2019 - 2024. All rights reserved.