Calibre Week in Review

PML input had some major changes this week. Thank the user WayneD for helping me out and getting me to actually do the work I’ve been putting off since I introduced PML/eReader as an input format.

There is now a metadata reader for PML and PMLZ. WayneD provided me with a set of regular expressions that can extract the metadata from a metatdata comment within a PML document. I took those regexes and created a metatdata plugin that supports both straight PML files as well as the PMLZ archive file.

The other major change to PML is, I’ve re-written the input parser. It is not longer based on a set of regular expressions. It is now a line oriented simple state machine. When I created the regex parser I intended to replace it at some point in the future with a true parser. The regex based one was simply a quick and dirty way to get PML supported. The new parser is much faster, produces cleaner and more accurate HTML output. It also has the added benefit of reading CX codes and turns them into table of contents entries for PML and PMLZ input. The new parser is much better and I’m not completely finished with it. I still need to add support for v comments (they are currently removed), n codes, and implement font attribute tracking to condense changes (this is how n will be handled).

WayneD did provide me with his Perl based line oriented simple state machine for PML to HTML conversion. I did use one idea from it. Turning footnote and sidebar xml syntax into custom PML tags. I had intended to port his parser to python and use it as a base but when I started looking at it I remembered I don’t know Perl at all and I can’t make heads or tails of Perl code. I have no desire or need to actually learn Perl, so I ended up writing my own parser.