One pet peeve of I have with my Cybook Gen 3 is its inability to properly display unicode characters in plain text files. I don’t need anything fancy like Japanese characters just simple things like “ and ” (as opposed to " and “). To solve this problem I’ve been thinking about adding an –asciize option to calibre. I say thinking because I didn’t really know where to start. Thankfully a user recently requested this very functionality in bug #2846. He even included a link to work to accomplish this very task.

I will be integrating transliteration of unicode to ascii into calibre soon. However, in the mean time here is a script and classes, see unidecoder for a better method, to accomplish this task outside of calibre. This is my python port of the ruby unidecode gem. Which is a port of the original perl Text::Unidecode.

The major differences between my implementation and the others is it’s written in python and it uses a single dictionary instead of loading the code group files as needed.

You can find out more on how this all works at http://interglacial.com/~sburke/tpj/as_html/tpj22.html