Calibre Week in Review

A lot of work went into eReader and PML to have it supported better. Also, a new format has been added.

The XHTML to PML parser has been completely rewritten. It is based on the XHTML to FB2 parser I wrote for FB2 output. It produces much better looking PML markup and the displayed output looks very close to the original XHTML source. One major advantage of the new parser is that it accounts for XHTML style information and translates that into PML tags. For example if text is set to bold by CSS then the text will get the bold PML tag.

eReader also got another very important addition. Support for Makebook (202 bye header) file input. Makebook and Dropbook are the two applications provided by eReader (the company) for producing eReader files. Makebook is the older application that is no longer supported. Makebook and Dropbook produce very different record 0 headers. This header has information about where the text, images and other things contained in the file are located. It took a while but I’ve been able to understand enough of the Makebook header to add input support for these files.

Makebook produces a 202 byte header while Dropbook produces a 132 byte header. After comparing header values and section sizes I was able to determine that the 2 byte int at offset 0x08 contained the start of the non-text offset. Just like the 132 byte header files, everything before this offset is text.

Images in the 202 byte header files were easy to find because they are in the same format as the Dropbook produced files. However, I didn’t bother to determine if there was a header value. Since all images are in PNG format and the their section start with the text PNG, I simply loop though all non-text sections and see if they start with PNG. If they do I know it’s an image and extract it.

The hardest part of the 202 byte header files was the text itself. Even though I knew which sections contained the text I didn’t know how it was compressed. This is where Google came to the rescue. On the homepage for the Z-DOC PalmPilot application I found there was some work to reverse engineer this older format. This page gave me the information I was looking for. Text is PalmDoc compressed and then xored with 0xA5. It looks like this xor is an attempt to obfuscate the compression used to make it harder to decompress. It isn’t for copy protection because the Makebook application only produces non-DRM files. DRM eReader files from that time would be created in a different manner.

Syncing news now supports auto convert in the GUI. It’s just like auto convert with sending email and sending an eBook to a device. If the book is not in a format supported by the device it will be auto converted to a supported format based on user preference.

The final bit of work this week was support for the RocketBook (RB) format. Both input and output are working. Though they both do need testing. Output in particular as I don’t have a device that supports these files so I can only guess based on my input code that the RB files produced are 100% correct. If someone has a device that reads RB files please let me know if the output files are read correctly.