This week has been a productive one. I’ve made a lot of small GUI enhancements and did some work on PDF input as well. All of these changes have not made it into trunk yet. This is mainly because Kovid has been away this week.
I’ve added auto complete to a number of the input control on the GUI. Authors, Publisher, and Tags all auto complete pretty much everywhere now. The Tags will even auto complete in the table view in the main window. However, Authors, Series and Publisher do not auto complete in the main windows as of yet.
I’ve also been working with the GUI’s search. ISBN, Rating, Cover fields are all included in the default search. They are also search field identifiers. Meaning you can do isbn:123 to search just isbn numbers. Searching for empty and filled fields has been implemented as well. Use field:false and field:true respectively.
PDF input, either is or last week, got the ability to specify an unwrap factor for unwrapping lines. Previously this was a fixed value. Now it can be changed by the user. I have some ideas to enhance this further but I’m not going to to into detail because they may not materialize. Use the option –unwrap-factor with a decimal value 0 - 1. It is used by the regular expression that determines the minimum line length required for unwrapping.
PDF input had another highly requested change. The ability to remove headers and footers. However, it’s not as user friendly as I would like. There are four new options in total. –remove-header, –remove-footer, –header-regex, and –footer-regex. If the –remove-* options are used then a regular expression that can be customized by using –*-regex is used to match headers and footers. The header and footer matching happens before all other processing rules. Use the ebook-convert’s –debug-input option to see the HTML that the regex will be matched against.
$ ebook-convert input.pdf .epub --debug-input output_dir/