2014/09/06

Wrong text encoding while Jsoup parse document

While page encoding is different with content type encoding declaration. Jsoup will get wrong text decode content. To avoid this problem, Assign a correct text encoding will be required.

Connection connection = Jsoup.connect(requestUrl)
        .data("type", "1")
        .data("searchKeyUID", searchKeyUID)
        .timeout(timeout)
        .method(Connection.Method.GET);

Connection.Response response = connection.execute();
Document document = Jsoup.parse(new ByteArrayInputStream(response.bodyAsBytes()), "UTF-8", requestUrl);