我正在尝试打印出https://top.baidu.com和https://www.qq.com的HTML文本,它们均使用GB2312字符编码。它会正常打印到控制台上,但中文字符除外,中文字符会以不可读的文本形式出现,例如。㿴。
但是,当我将地址更改为https://www.sina.com.cn或https://world.taobao.com时,汉字都很好用,它们都使用UTF-8。
除了很好地要求百度和QQ切换到UTF-8之外,我对此无能为力吗?这是我的代码。
try {
String address1 = "https://top.baidu.com"; //unreadable
String address2 = "https://www.qq.com"; //also unreadable
String address3 = "https://www.sina.com.cn"; //readable
String address4 = "https://world.taobao.com"; //readable, too
URL url = new URL(address1);
StringBuilder htmlText = new StringBuilder();
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
InputStream stream = connection.getInputStream();
InputStreamReader reader = new InputStreamReader(stream);
int data = reader.read();
while (data != -1) {
char current = (char) data;
htmlText.append(current);
data = reader.read();
}
System.out.println(htmlText);
} catch (Exception e) {
e.printStackTrace();
}
我正在尝试打印出https://top.baidu.com和https://www.qq.com的HTML文本,它们均使用GB2312字符编码。除中文字符外,它正常打印到控制台,...