Tags

, , , , , ,

Roman Zoller and Matthias Lambert

Scope of the Project

The goal of the project is to create a converter for LaTeX documents to EPUB/Mobipocket formats with focus on mathematical formulas. The latest version of the EPUB format (version 3) includes support for MathML, but is largely unsupported by e-readers on the market (Kindle, Kobo, etc.). However, it is possible to convert formulas to PNG/SVG that can be included in an HTML document, which is then converted to an e-book. Some tools exist that do part of the job, but there is no complete and convincing solution.
The objective is to create an easy-to-use way of converting documents with a result of reasonable quality for some simple LaTeX files that include formulas (LaTeX is very complex, so perfect conversion is not possible in general).

State of the Art

As we saw during the course, the EPUB format is composed of a ZIP archive containing different files. The text is encoded in HTML format, together with some CSS for the formatting. However, only a subset of CSS is actually supported by the standard. Furthermore, when converting from LaTeX to HTML, part of the metadata could be lost in the cascade of tools, resulting in a non-complete EPUB file. The idea is to combine existing tools as much as possible, developing the missing features.
Here is an overview of some of the tools that already exist, with their capabilities and shortcomings.

Converters from LaTeX to HTML:

  • plasTeX : no longer maintained (last update in 07.2009). Complete LaTeX processing framework written in Python. It has a flexible way of handling rendering of equations or any other LaTeX feature not supported in HTML. The documentation provides an example of conversion of equations to PNG.
  • LaTeXML : Complete LaTeX processing framework written in Perl. The main part of the tool converts from LaTeX to an XML file with flexible tags. It also includes a post-processor designed to take the XML file and convert it to HTML. Equations can either be exported in MathML or images.
  • TeX4ht : main contributor died in 2009 but still updated (last update in 10.2012). Conversion system capable of producing outputs in a variety of markup languages, including HTML, from a TeX-based source file. It uses the native TeX compiler and interacts with TeX-based applications through style files and post-processors. It should be possible to configure this tool to export equations to SVG.
  • TtH : Tool written in C which converts LaTeX to HTML. One of the most interesting features of this tool is the way it handles equations. Rather than using an external rendering engine which would produce a bitmap or vectorial image, this tool internally converts mathematical equations into HTML using regular characters arranged in HTML tables. This produces a very compact and quickly renderable HTML file, because the HTML rendering engine doesn’t need to fetch a large quantity of data in an external file. However, because it uses regular characters, the complexity of the renderable equations could be limited.

Converters from HTML to EPUB/Mobipocket:

  • Pandoc : Universal document converter. This library and associated command line tools written in Haskell are capable of converting documents from a few different input markup formats (including LaTeX) to a variety of output formats, including HTML and EPUB. However, the direct conversion from LaTeX to EPUB is not usable as-is. Some configuration is possible but it remains to be seen if the use of external rendering engines is possible for equations.
  • Calibre ebook-convert : Calibre is a complete e-book management suite. It incorporates a command-line tool to convert from and to a variety of e-book oriented formats. In particular, it is capable of producing EPUB and Mobipocket e-books from an HTML source. However, metadata has to be manually given as parameter to the command line tool, which restricts its usability.

Conclusion

In conclusion, a lot of tools exist. A lot of them are incomplete, not developed anymore, or just not exactly suitable to the purpose of this project. The work of this project consists of incorporating some existing tools with some custom code and scripts to form a complete usable workflow package, capable of producing an e-book readable on any standard e-reader, from a standard LaTeX file.
Today, the best way to consult scientific publications on-the-go is using power-hungry high-resolution touch-screen tablets capable of rendering PDF files, while regular books don’t need such hardware. With the democratization of e-readers, more and more people will be using them instead of reading paper books and it would be practical to include scientific papers in the scope of regular e-readers.

Reference: Use LaTeX to produce Epub

Advertisements