爪哇 - 从URL中读取页面源返回未知字符

问题描述 投票:1回答:1

我使用下面的代码来读取URL(https://www.amazon.com)与在NetBeans“UTF-8”字符集页源极,但它返回未知字符(附加图像)。我没有任何想法,有什么问题,如果帮我修改代码以正常工作会gratefull?谢谢。

enter image description here

public static String getURLSource(String url) throws IOException
{
    URL urlObject = new URL(url);
    URLConnection urlConnection = urlObject.openConnection();
    urlConnection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");

    return toString(urlConnection.getInputStream());
}

private static String toString(InputStream inputStream) throws IOException
{
    try (BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream, "UTF-8")))
    {
        String inputLine;
        StringBuilder stringBuilder = new StringBuilder();
        while ((inputLine = bufferedReader.readLine()) != null)
        {
            stringBuilder.append(inputLine);
        }

        return stringBuilder.toString();
    }
}
java amazon-web-services aws-lambda amazon-dynamodb amazon
1个回答
0
投票

使用HttpsUrlConnectionUrlConnection代替。见a similar question

© www.soinside.com 2019 - 2024. All rights reserved.