随着全世界发生整个COVID-19危机,我决定着手一个书呆子的小项目。
[我正在尝试在称为“桌面模拟器”的游戏中大量收集卡片的数字副本以产生魔术效果。我也有些生疏,但想跳回编程中,为什么不这样做?
我现在所处的位置:我已经制作了一个程序(从下面提供),目前应该从具有所有常见扩展名的网站中提取所有图像。
但是,我注意到源代码是这样的:
<a href="../Card/Details.aspx?multiverseid=482864" id="ctl00_ctl00_ctl00_MainContent_SubContent_SubContent_ctl00_listRepeater_ctl00_cardImageLink" onclick="return CardLinkAction(event, this, 'SameWindow');">
<img src="../../Handlers/Image.ashx?multiverseid=482864&type=card" id="ctl00_ctl00_ctl00_MainContent_SubContent_SubContent_ctl00_listRepeater_ctl00_cardImage" style="border-radius:6px;-webkit-border-radius:6px;-moz-border-radius:6px;" width="95" height="132" alt="Abandoned Sarcophagus" border="0">
</a>
This链接是拉出卡片图像的原因...我注意到对我来说新的格式是“ .jfif”,我想它是“ .jpeg”的新版本。如何从页面中提取出来?
代码不是我自己的想法,是从较旧的post
获得的CODE:
import java.io.BufferedReader;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
import javax.imageio.ImageIO;
import javax.swing.text.AttributeSet;
import javax.swing.text.html.HTML;
import javax.swing.text.html.HTMLDocument;
import javax.swing.text.html.HTMLEditorKit;
public class Extract_cards {
public static void main(String args[]) throws Exception {
String webUrl = "https://gatherer.wizards.com/Pages/Search/Default.aspx?page=0&output=standard&action=advanced&set=%20[%22Ikoria%20Commander%22]";
URL url = new URL(webUrl);
URLConnection connection = url.openConnection();
InputStream is = connection.getInputStream();
InputStreamReader isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr);
HTMLEditorKit htmlKit = new HTMLEditorKit();
HTMLDocument htmlDoc = (HTMLDocument) htmlKit.createDefaultDocument();
htmlKit.read(br, htmlDoc, 0);
for (HTMLDocument.Iterator iterator = htmlDoc.getIterator(HTML.Tag.A); iterator.isValid(); iterator.next()) {
AttributeSet attributes = iterator.getAttributes();
String imgSrc = (String) attributes.getAttribute(HTML.Attribute.HREF);
System.out.println(imgSrc);
if (imgSrc != null && (imgSrc.toLowerCase().endsWith(".jpg") || (imgSrc.endsWith(".jfif")) || (imgSrc.endsWith(".png")) || (imgSrc.endsWith(".jpeg")) || (imgSrc.endsWith(".bmp")) || (imgSrc.endsWith(".ico")))) {
try {
downloadImage(webUrl, imgSrc);
} catch (IOException ex) {
System.out.println(ex.getMessage());
}
}
}
}
private static void downloadImage(String url, String imgSrc) throws IOException {
BufferedImage image = null;
try {
if (!(imgSrc.startsWith("http"))) {
url = url + imgSrc;
} else {
url = imgSrc;
}
imgSrc = imgSrc.substring(imgSrc.lastIndexOf("/") + 1);
String imageFormat = null;
imageFormat = imgSrc.substring(imgSrc.lastIndexOf(".") + 1);
String imgPath = null;
imgPath = "/img depository" + imgSrc + "";
URL imageUrl = new URL(url);
image = ImageIO.read(imageUrl);
if (image != null) {
File file = new File(imgPath);
ImageIO.write(image, imageFormat, file);
System.out.println("Success!");
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
控制台输出:
../Default.aspx
javascript:void(0);
../Default.aspx
../Advanced.aspx
../Card/Details.aspx?action=random
../Default.aspx
../Advanced.aspx
../Card/Details.aspx?action=random
../Settings.aspx
../Language.aspx
../Help.aspx
/Pages/Search/Default.aspx?page=0&output=standard&action=advanced&set= ["Ikoria Commander"]
/Pages/Search/Default.aspx?page=1&output=standard&action=advanced&set= ["Ikoria Commander"]
/Pages/Search/Default.aspx?page=2&output=standard&action=advanced&set= ["Ikoria Commander"]
/Pages/Search/Default.aspx?page=3&output=standard&action=advanced&set= ["Ikoria Commander"]
/Pages/Search/Default.aspx?page=1&output=standard&action=advanced&set= ["Ikoria Commander"]
javascript:void(0);
javascript:void(0);
/Pages/Search/Default.aspx?page=0&output=standard&action=advanced&set=%20[%22Ikoria%20Commander%22]&removesortkey=abc
/Pages/Advanced.aspx?action=advanced&set=+["Ikoria Commander"]&output=standard
../Card/Details.aspx?multiverseid=482864
../Card/Details.aspx?multiverseid=482864
../Card/Details.aspx?multiverseid=482864
../Card/Details.aspx?multiverseid=430847
../Card/Details.aspx?multiverseid=482826
../Card/Details.aspx?multiverseid=482826
../Card/Details.aspx?multiverseid=482826
../Card/Details.aspx?multiverseid=386464
../Card/Details.aspx?multiverseid=482827
../Card/Details.aspx?multiverseid=482827
../Card/Details.aspx?multiverseid=482827
../Card/Details.aspx?multiverseid=386467
../Card/Details.aspx?multiverseid=420794
../Card/Details.aspx?multiverseid=482793
../Card/Details.aspx?multiverseid=482793
../Card/Details.aspx?multiverseid=482793
../Card/Details.aspx?multiverseid=189880
../Card/Details.aspx?multiverseid=207333
../Card/Details.aspx?multiverseid=247317
../Card/Details.aspx?multiverseid=226906
../Card/Details.aspx?multiverseid=265718
../Card/Details.aspx?multiverseid=376237
../Card/Details.aspx?multiverseid=380236
../Card/Details.aspx?multiverseid=405117
../Card/Details.aspx?multiverseid=430309
../Card/Details.aspx?multiverseid=446868
../Card/Details.aspx?multiverseid=451082
../Card/Details.aspx?multiverseid=482828
../Card/Details.aspx?multiverseid=482828
../Card/Details.aspx?multiverseid=482828
../Card/Details.aspx?multiverseid=416830
../Card/Details.aspx?multiverseid=482700
../Card/Details.aspx?multiverseid=482700
../Card/Details.aspx?multiverseid=482700
../Card/Details.aspx?multiverseid=417575
../Card/Details.aspx?multiverseid=430541
../Card/Details.aspx?multiverseid=456526
**continues**...
../Card/Details.aspx?multiverseid=409857
../Card/Details.aspx?multiverseid=482799
../Card/Details.aspx?multiverseid=482799
../Card/Details.aspx?multiverseid=482799
../Card/Details.aspx?multiverseid=124047
../Card/Details.aspx?multiverseid=482901
../Card/Details.aspx?multiverseid=482901
../Card/Details.aspx?multiverseid=482901
../Card/Details.aspx?multiverseid=189273
../Card/Details.aspx?multiverseid=275257
../Card/Details.aspx?multiverseid=416976
../Card/Details.aspx?multiverseid=420912
../Card/Details.aspx?multiverseid=423542
../Card/Details.aspx?multiverseid=433181
../Card/Details.aspx?multiverseid=446986
../Card/Details.aspx?multiverseid=470788
/Pages/Search/Default.aspx?page=0&output=standard&action=advanced&set= ["Ikoria Commander"]
/Pages/Search/Default.aspx?page=1&output=standard&action=advanced&set= ["Ikoria Commander"]
/Pages/Search/Default.aspx?page=2&output=standard&action=advanced&set= ["Ikoria Commander"]
/Pages/Search/Default.aspx?page=3&output=standard&action=advanced&set= ["Ikoria Commander"]
/Pages/Search/Default.aspx?page=1&output=standard&action=advanced&set= ["Ikoria Commander"]
https://magic.wizards.com/
../TermsOfUse.aspx
../PrivacyPolicy.aspx
../CodeOfConduct.aspx
http://www.magicthegathering.com
https://magic.wizards.com/en/content/magic-online-products-game-info
../Settings.aspx
../Language.aspx
https://company.wizards.com/policies/web/cookie
../Help.aspx
https://company.wizards.com
我不知道您在那个链接上看到了.jfif
,因为在任何地方都看不到。
我看到的是链接URL:https://gatherer.wizards.com/Handlers/Image.ashx?multiverseid=482864&type=card
在网络浏览器(对我来说是FireFox)中打开时,我看到服务器响应具有以下HTTP标头:
Cache-Control: public
Content-Type: image/jpeg
Expires: Fri, 16 Apr 2021 04:30:35 GMT
Server: Microsoft-IIS/8.5
X-AspNet-Version: 2.0.50727
X-Powered-By: ASP.NET
Date: Thu, 16 Apr 2020 04:30:35 GMT
Content-Length: 170170
重要的部分是Content-Type
,其值为image/jpeg
,告诉您内容是JPEG图像。
不幸的是,服务器没有提供建议性的文件名,该文件名应该是这样的标题:
Content-Disposition: attachment; filename="filename.jpg"
如果没有服务器的建议,并且您知道并理解了URL,则可以例如编写代码以从URL和Content-Type
标头命名文件,命名文件card482864.jpeg
。