jsoup home page: <https://jsoup.org>
jsoup download: <https://jsoup.org/download> (e.g. jsoup-1-15.3.jar, released August 24th, 2022)

jsoup tutorials
--------
jsoup cookbook: <https://jsoup.org/cookbook/>
Tutorialspoint: <https://www.tutorialspoint.com/jsoup/index.htm>
"Parsing HTML in Java with Jsoup": <https://www.baeldung.com/java-with-jsoup>
"Parsing and Extracting HTML with Jsoup": <https://howtodoinjava.com/java/library/complete-jsoup-tutorial/>

Wikipedia
-------
jsoup: <https://en.wikipedia.org/wiki/Jsoup>
Tag soup: <https://en.wikipedia.org/wiki/Tag_soup> 
Web scraping: <https://en.wikipedia.org/wiki/Web_scraping>
Data wrangling: <https://en.wikipedia.org/wiki/Data_wrangling>


------------------

HTML -> xhtml !


<https://stackoverflow.com/questions/29087077/is-it-possible-to-convert-html-into-xhtml-with-jsoup-1-8-1>

See Document.OutputSettings.Syntax.xml:

private String toXHTML( String html ) {
    final Document document = Jsoup.parse(html);
    document.outputSettings().syntax(Document.OutputSettings.Syntax.xml);    
    return document.html();
}


You most likely also want to: document.outputSettings().escapeMode(org.jsoup.nodes.Entities.EscapeMode.xhtml) to convert &nbsp; to &#xa0; — the former isn't allowed in xhtml. – 
KajMagnus
Sep 8, 2018 at 7:28


----------------------

