jfiveparse pass all the non scripted tests for the tokenizer and tree construction from the html5lib-tests suite.
It provide both fragment and full document parsing. It can parse directly from a String or by streaming through a Reader (note: the encoding must be known, currently the parser does not implement an autodetect feature).
Requires java 11.
As far as I know, there is no pure java html5 parser that currently pass the html5lib-tests suite (well, the more relevant tests :D, note: this project was published in october 2015).
Additionally, I wanted a library with a reduced footprint (and no dependencies). Currently the jar weight around ~150kb. The target is to keep it under 200kb.
Performance should be competitive with other java parsers.
jfiveparse is licensed under the Apache License Version 2.0.
maven:
<dependency>
<groupId>ch.digitalfondue.jfiveparse</groupId>
<artifactId>jfiveparse</artifactId>
<version>1.1.1</version>
</dependency>
gradle:
compile 'ch.digitalfondue.jfiveparse:jfiveparse:1.1.1'
If you use it as a module, remember to add requires ch.digitalfondue.jfiveparse;
in your mo