Books in EPUB, HTML and XML formats

As mentioned in a previous post, we are working on producing electronic books in formats other than PDF. In order to give you an impression of our recent advances, here are HTML, XML and EPUB versions of the first book in the Conceptual Foundations of Language Science series, Natural Causes of Language by N.J. Enfield:

All of these formats were produced from the original LaTeX sources of the book using a development version of the texhs converter, with only some minimal styling applied.

Converting from LaTeX is no trivial task (in fact, parsing TeX and LaTeX is strictly impossible in general). As a result, a lot of work went into the design and implementation of texhs. It includes things like a category-code-aware lexer, a macro processor for user-defined commands, a tightly integrated BibLaTeX engine (with on-the-fly translation from legacy BibTeX) and three separate API layers that can be used to process the book content programmatically.

This architecture enables some more advanced output features with ease. For example, in the HTML output all inline citations (that use LaTeX \cite commands) are not only hyperlinked to the corresponding entry in the bibliography, but the full citation is also readily shown if you hover over the inline link with your mouse.

However, the converter is still in its early days and has yet to learn many commonly used TeX and LaTeX commands and environments. Fortunately, most of these are now easy to add, building on the flexible architecture of the converter. Still, you can probably never expect it to swallow arbitrary LaTeX content and have it magically turned into well-structured and valid XML or HTML. In fact, the converter actively rejects input that violates its underlying document model. This can be used for automatic quality assurance: If a LaTeX document passes the converter, you can be sure that, for example, every table has a caption.

For this reason it currently requires some effort to prepare existing LangSci books for the conversion into these additional formats. But this will get easier and easier as texhs continues to evolve and eventually EPUB, HTML and XML formats should become available for all of the books published by LangSci press. For the moment, enjoy the preview versions linked to above.

Leave a Reply

Your email address will not be published.

Hinweis / Hint
Das Captcha kann Kleinbuchstaben, Ziffern und die Sonderzeichzeichen »?!#%&« enthalten.
The captcha could contain lower case, numeric characters and special characters as »!#%&«.