使用java有没有办法从没有扩展名的网站上下载图像?

问题描述 投票:0回答:1

随着全世界发生整个COVID-19危机,我决定着手一个书呆子的小项目。

[我正在尝试在称为“桌面模拟器”的游戏中大量收集卡片的数字副本以产生魔术效果。我也有些生疏,但想跳回编程中,为什么不这样做?

我现在所处的位置:我已经制作了一个程序(从下面提供),目前应该从具有所有常见扩展名的网站中提取所有图像。

但是,我注意到源代码是这样的:

<a href="../Card/Details.aspx?multiverseid=482864" id="ctl00_ctl00_ctl00_MainContent_SubContent_SubContent_ctl00_listRepeater_ctl00_cardImageLink" onclick="return CardLinkAction(event, this, 'SameWindow');">

<img src="../../Handlers/Image.ashx?multiverseid=482864&amp;type=card" id="ctl00_ctl00_ctl00_MainContent_SubContent_SubContent_ctl00_listRepeater_ctl00_cardImage" style="border-radius:6px;-webkit-border-radius:6px;-moz-border-radius:6px;" width="95" height="132" alt="Abandoned Sarcophagus" border="0">

</a>

This链接是拉出卡片图像的原因...我注意到对我来说新的格式是“ .jfif”,我想它是“ .jpeg”的新版本。如何从页面中提取出来?

代码不是我自己的想法,是从较旧的post

获得的

CODE:

import java.io.BufferedReader;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;

import javax.imageio.ImageIO;
import javax.swing.text.AttributeSet;
import javax.swing.text.html.HTML;
import javax.swing.text.html.HTMLDocument;
import javax.swing.text.html.HTMLEditorKit;

public class Extract_cards {

    public static void main(String args[]) throws Exception {

        String webUrl = "https://gatherer.wizards.com/Pages/Search/Default.aspx?page=0&output=standard&action=advanced&set=%20[%22Ikoria%20Commander%22]";
        URL url = new URL(webUrl);
        URLConnection connection = url.openConnection();
        InputStream is = connection.getInputStream();
        InputStreamReader isr = new InputStreamReader(is);
        BufferedReader br = new BufferedReader(isr);

        HTMLEditorKit htmlKit = new HTMLEditorKit();
        HTMLDocument htmlDoc = (HTMLDocument) htmlKit.createDefaultDocument();
        htmlKit.read(br, htmlDoc, 0);

        for (HTMLDocument.Iterator iterator = htmlDoc.getIterator(HTML.Tag.A); iterator.isValid(); iterator.next()) {
            AttributeSet attributes = iterator.getAttributes();
            String imgSrc = (String) attributes.getAttribute(HTML.Attribute.HREF);

            System.out.println(imgSrc);
            if (imgSrc != null && (imgSrc.toLowerCase().endsWith(".jpg") || (imgSrc.endsWith(".jfif")) || (imgSrc.endsWith(".png")) || (imgSrc.endsWith(".jpeg")) || (imgSrc.endsWith(".bmp")) || (imgSrc.endsWith(".ico")))) {
                try {
                    downloadImage(webUrl, imgSrc);
                } catch (IOException ex) {
                    System.out.println(ex.getMessage());
                }
            }
        }
    }
    private static void downloadImage(String url, String imgSrc) throws IOException {
        BufferedImage image = null;
        try {
            if (!(imgSrc.startsWith("http"))) {
                url = url + imgSrc;
            } else {
                url = imgSrc;
            }
            imgSrc = imgSrc.substring(imgSrc.lastIndexOf("/") + 1);
            String imageFormat = null;
            imageFormat = imgSrc.substring(imgSrc.lastIndexOf(".") + 1);
            String imgPath = null;
            imgPath = "/img depository" + imgSrc + "";
            URL imageUrl = new URL(url);
            image = ImageIO.read(imageUrl);
            if (image != null) {
                File file = new File(imgPath);
                ImageIO.write(image, imageFormat, file);
                System.out.println("Success!");
            }
        } catch (Exception ex) {
            ex.printStackTrace();
        }
    }
}

控制台输出:

../Default.aspx
javascript:void(0);
../Default.aspx
../Advanced.aspx
../Card/Details.aspx?action=random
../Default.aspx
../Advanced.aspx
../Card/Details.aspx?action=random
../Settings.aspx
../Language.aspx
../Help.aspx
/Pages/Search/Default.aspx?page=0&output=standard&action=advanced&set= ["Ikoria Commander"]
/Pages/Search/Default.aspx?page=1&output=standard&action=advanced&set= ["Ikoria Commander"]
/Pages/Search/Default.aspx?page=2&output=standard&action=advanced&set= ["Ikoria Commander"]
/Pages/Search/Default.aspx?page=3&output=standard&action=advanced&set= ["Ikoria Commander"]
/Pages/Search/Default.aspx?page=1&output=standard&action=advanced&set= ["Ikoria Commander"]
javascript:void(0);
javascript:void(0);
/Pages/Search/Default.aspx?page=0&output=standard&action=advanced&set=%20[%22Ikoria%20Commander%22]&removesortkey=abc
/Pages/Advanced.aspx?action=advanced&set=+["Ikoria Commander"]&output=standard
../Card/Details.aspx?multiverseid=482864
../Card/Details.aspx?multiverseid=482864
../Card/Details.aspx?multiverseid=482864
../Card/Details.aspx?multiverseid=430847
../Card/Details.aspx?multiverseid=482826
../Card/Details.aspx?multiverseid=482826
../Card/Details.aspx?multiverseid=482826
../Card/Details.aspx?multiverseid=386464
../Card/Details.aspx?multiverseid=482827
../Card/Details.aspx?multiverseid=482827
../Card/Details.aspx?multiverseid=482827
../Card/Details.aspx?multiverseid=386467
../Card/Details.aspx?multiverseid=420794
../Card/Details.aspx?multiverseid=482793
../Card/Details.aspx?multiverseid=482793
../Card/Details.aspx?multiverseid=482793
../Card/Details.aspx?multiverseid=189880
../Card/Details.aspx?multiverseid=207333
../Card/Details.aspx?multiverseid=247317
../Card/Details.aspx?multiverseid=226906
../Card/Details.aspx?multiverseid=265718
../Card/Details.aspx?multiverseid=376237
../Card/Details.aspx?multiverseid=380236
../Card/Details.aspx?multiverseid=405117
../Card/Details.aspx?multiverseid=430309
../Card/Details.aspx?multiverseid=446868
../Card/Details.aspx?multiverseid=451082
../Card/Details.aspx?multiverseid=482828
../Card/Details.aspx?multiverseid=482828
../Card/Details.aspx?multiverseid=482828
../Card/Details.aspx?multiverseid=416830
../Card/Details.aspx?multiverseid=482700
../Card/Details.aspx?multiverseid=482700
../Card/Details.aspx?multiverseid=482700
../Card/Details.aspx?multiverseid=417575
../Card/Details.aspx?multiverseid=430541
../Card/Details.aspx?multiverseid=456526
**continues**...
../Card/Details.aspx?multiverseid=409857
../Card/Details.aspx?multiverseid=482799
../Card/Details.aspx?multiverseid=482799
../Card/Details.aspx?multiverseid=482799
../Card/Details.aspx?multiverseid=124047
../Card/Details.aspx?multiverseid=482901
../Card/Details.aspx?multiverseid=482901
../Card/Details.aspx?multiverseid=482901
../Card/Details.aspx?multiverseid=189273
../Card/Details.aspx?multiverseid=275257
../Card/Details.aspx?multiverseid=416976
../Card/Details.aspx?multiverseid=420912
../Card/Details.aspx?multiverseid=423542
../Card/Details.aspx?multiverseid=433181
../Card/Details.aspx?multiverseid=446986
../Card/Details.aspx?multiverseid=470788
/Pages/Search/Default.aspx?page=0&output=standard&action=advanced&set= ["Ikoria Commander"]
/Pages/Search/Default.aspx?page=1&output=standard&action=advanced&set= ["Ikoria Commander"]
/Pages/Search/Default.aspx?page=2&output=standard&action=advanced&set= ["Ikoria Commander"]
/Pages/Search/Default.aspx?page=3&output=standard&action=advanced&set= ["Ikoria Commander"]
/Pages/Search/Default.aspx?page=1&output=standard&action=advanced&set= ["Ikoria Commander"]
https://magic.wizards.com/
../TermsOfUse.aspx
../PrivacyPolicy.aspx
../CodeOfConduct.aspx
http://www.magicthegathering.com
https://magic.wizards.com/en/content/magic-online-products-game-info
../Settings.aspx
../Language.aspx
https://company.wizards.com/policies/web/cookie
../Help.aspx
https://company.wizards.com
java html css bufferedimage javax.imageio
1个回答
0
投票

我不知道您在那个链接上看到了.jfif,因为在任何地方都看不到。

我看到的是链接URL:https://gatherer.wizards.com/Handlers/Image.ashx?multiverseid=482864&type=card

在网络浏览器(对我来说是FireFox)中打开时,我看到服务器响应具有以下HTTP标头:

Cache-Control: public
Content-Type: image/jpeg
Expires: Fri, 16 Apr 2021 04:30:35 GMT
Server: Microsoft-IIS/8.5
X-AspNet-Version: 2.0.50727
X-Powered-By: ASP.NET
Date: Thu, 16 Apr 2020 04:30:35 GMT
Content-Length: 170170

重要的部分是Content-Type,其值为image/jpeg,告诉您内容是JPEG图像。

不幸的是,服务器没有提供建议性的文件名,该文件名应该是这样的标题:

Content-Disposition: attachment; filename="filename.jpg"

如果没有服务器的建议,并且您知道并理解了URL,则可以例如编写代码以从URL和Content-Type标头命名文件,命名文件card482864.jpeg

© www.soinside.com 2019 - 2024. All rights reserved.