I’m the first person to acknowledge that loading epub and HTML files in Sigil 0.1.x is slow. Very slow. Abysmally.
Why is that? Well there are two components in the 0.1.x loading process:
- Extracting the epub file, reading the OPF, running Tidy on the source, updating resource references etc. Let’s call this the “file load”.
- After the file load creates one large HTML flow, this is then sent to the Book View (integrated QtWebkit) for rendering. Let’s call this the “QtWebkit load”.
You can tell when the file load has finished and the QtWebkit load is starting: the moment you see “File Loaded” in the status bar is the moment that all Sigil code stops executing and now it’s up to QtWebkit to render the page.
QtWebkit load takes longer, by far. Using the Three Men in a Boat epub file from the MobileRead ebook uploads forum as a reference, the whole loading procedure takes 75 seconds in Sigil 0.1.9. Way, way too long.
Of these 75 seconds, only 14.5 are spent in the file load. The rest is all QtWebkit, so not something I can directly influence. The QtWebkit load used to be more than twice as fast in Sigil 0.1.5, but the subsequent versions of Sigil include Qt 4.6 (instead of 4.5), and in that version QtWebkit is much slower. Nokia developers admit they introduced some major performance regressions and are currently working on fixing that.
I have no intention on waiting for that to happen. It was bad before, but now it’s horrible. I considered it far too slow in Qt 4.5, and now?… So how about I come up with a way to work around this problem?
Currently, Sigil takes in all of the XHTML files in an epub, puts them all together and displays them as one. So you get one large “flow” where you can do all the editing. I chose this model because it’s the one used in the popular Book Designer (which doesn’t support epubs).
This is where Sigil 0.2.0 comes in.
Sigil no longer does that. All the original XHTML files are preserved and are edited one by one. Since there is no one huge flow, QtWebkit rendering performance goes up tremendously (since there is less to render).
Now, when Sigil loads your epub file, the first XHTML file by reading order is loaded in the initial tab. Since this first XHTML file is usually a cover page, it takes less than half a second to render. So now instead of 60 seconds for the QtWebkit load, you get 0.2 seconds (for the TMB file).
Great, ha? :)
But then again, there are those 14.5 seconds for the file load. It would be great if I could get that down.
Most of that time is spent doing two things: running Tidy on the large concatenated HTML document, and updating the resource reference paths. The updating process takes much longer.
The resource updating process is necessary since Sigil 0.1.x renames your resource files. Since images, CSS files, HTML files, fonts etc. now have different names, all the HTML tags and style rules referencing them have to be updated. This takes a long time.
In Sigil 0.2.0, there are now multiple XHTML files, and they all have to be updated the same way. The original resource filenames are now preserved, but the file structure changes so we still need to update the paths. All the XHTML’s have different content, but the file path updates are universal. This means we can now parallelize this:
- We create a thread pool equal to the number of logical CPU’s on the system;
- We split the updating process into “tasks”, where each task represents the required update operations to be performed on each XHTML file;
- We let the threads munch the tasks as they become ready to process them.
So if you have a dual core system like I do, two different threads execute two tasks at the same time. As they finish the task they have been working on, they arbitrarily pick a new one and work on that. So the more logical CPU’s you have, the more threads you can run, the more tasks your computer can work on at the same time, the faster the file loading will be.
I then plugged in the old updating subsystem into this multi-threaded architecture and ran it on my dual core. The file load on TMB dropped to 11 seconds. Not quite the ideal linear behavior, but that’s to be expected since there’s overhead in talking to the threads, managing the task pool etc. And the OS eats your cores too, so your threads can’t stay active all the time. Also, not everything in the file load can be parallelized; lots of things have to stay sequential.
The other major problem is that TMB has a huge number of images, meaning that many HTML “<img>” elements have to be updated. With a more conservative epub file, the numbers would be even better.
But a 25% improvement on a measly dual core isn’t half bad. It would certainly be faster on a quad. But I can take that even more down, I know I can.
So I spent about six hours in front of a code profiler and Visual Studio, tracing the bottlenecks and optimizing the “hot” paths. The major bottleneck was—as expected—the large and cumbersome resource updating subsystem. After rewriting it in what must have been ten different ways (each version slightly faster than the previous), I came up with the final design.
For the sake of reference, my profiler says that the old version takes on average 470 milliseconds to run through one XHTML file in TMB. After six hours messing with it, the final version takes 15 milliseconds. That’s 31 times faster.
File load for TMB? It’s now 3.3 seconds. Including the 0.3 for the rendering of the cover page, it’s 3.6.
So from 75 seconds in Sigil 0.1.9 down to 3.6 seconds in the development version of 0.2.0, I think I’ve done a pretty good job improving the loading speed.
For epubs with a “normal” number of images and computers with more logical CPU’s, it’s even faster.
|epub name||Time – 0.1.9 (s)||Time – dev0.2.0 (s)|
|Three Men in a Boat||75||3.6|
|Sylvie and Bruno||82.5||6|
|Savage Stories of Conan||90.2||4.5|
These are all x86 times. For x64, knock off 10%.
 “Render” means that the colors for the pixels on the screen have to be calculated, i.e. the screen has to be “painted”.
 Written by Jerome K. Jerome and painstakingly hand-crafted by MobileRead user zelda_pinwheel. It’s a great book, you should read it. It’s also an amazing epub file, I use it as my main reference during Sigil development.
 x86 Windows version of Sigil, on Windows 7 x64. Computer is a Core 2 Duo 6400 with 4GB RAM.
 But horrible.
 Logical CPU’s are the number of actual cores on your system and any “virtual” cores from HyperThreading.