Typesetting ePUBs in LaTex

Delhi, India

In a recent class on Typesetting we were assigned a task to typeset an entire book, preferably an Open Book, using your favorite typesetting Program in a few days time. Typesetting is not easy especially when you do it in LaTeX and with the skill going rusty it was a hot pot of dilemmas that I think should not be a part of this post at all.

The book I choose was in ePub format and desperately needed to be converted to latex files so I can typeset them in overleaf or sharelatex or texworks as a matter of fact. I have moved in the favour of these online compilers as I no longer have to fiddle with searching the included .sty that are so robotic to seek and use. I really wish there was some kind of package manager for latex too. Like pip for python. I Wish !

ePubs are essentially a zipped archive of .html files following a specific numbering scheme or format. So the task was essentially to convert the .html to .tex. I did try a few things with python trying the beautiful soup things. It did not taste well. I had to search other things.

I had heard of pandoc while I was writing my thesis few years back too but it felt to complex that time but now it was a necessity I just can’t ignore. Its interesting how the perceived complexity of a act changes with necessity. Pandoc is the Swiss army knife when handling documents. It can do anything. It took me a while to figure out it was a CLI tool. ( It happens )

It could translate the html page almost perfectly to be used in overleaf. Only some italics were not getting done properly for which I used sed and compiled it all in a shell script to iterate over the entire folder and in a blink I had a directory full of tex files ready for the next stage.

The work was only 10% done as a lot of widows needed new husbands and orphans needed parents, I hope you get what I mean !

Just in case you too are a fan of LaTeX , the code I used and wrote is here.

Long Live LaTeX !

Cheers,

Rohit