htmlcxx 0.87 Simple non-validating CSS1 and HTML parser for C++

htmlcxx is a simple non-validating CSS1 and HTML parser for C++. Although there are several other HTML parsers available, htmlcxx has some characteristics that make it unique:

  • STL like navigation of DOM tree, using excelent's tree.hh library from Kasper Peeters

  • It is possible to reproduce exactly, character by character, the original document from the parse tree

  • Bundled CSS parser

  • Optional parsing of attributes

  • C++ code that looks like C++ (not so true anymore)

  • Offsets of tags/elements in the original document are stored in the nodes of the DOM tree