Rile.js - html5 epub reader
What is it?
It’s a small HTML5 based EPUB document reader. I’ve created it partially for my fiancee’s writing blog, and partially because I have never done anything with an EPUB format, and I didn’t know anything about it.
Right now the reader is in an early alpha stage and works on Google Chrome and Mozilla Firefox. The next step is to get it working on IE9/10 and other browsers.
For those who don’t know what an EPUB document is - EPUB is a standardized open format for electronic publications. It is based on XML documents compressed together as a single zip file. The most important properties of this format are:
- An EPUB document is not divided into pages, the reading software decides how to divide the content into pages and how to display it,
- It supports CSS,
- Support for SVG and raster images.
Right now the goal for me is to create a simple reader that supports as many simple EPUB documents as possible, so anyone (e.g. a writer) can embed an EPUB document on his blog or page.
Time for some short technical summary.
Every EPUB document can have its own style sheet, and there is no guarantee that its style sheet will not leak some CSS properties into UI of the reader. For example, if the EPUB document has a style sheets which sets a CSS property to our HTML element, we could end up breaking our website or the reader:
Therefore, we need something that will give us a scope of our EPUB styles. If we would live in the future, then we just could use web components for that (http://www.w3.org/TR/2013/WD-components-intro-20130606/), but unfortunately, we don’t. But we can have right now the next best thing - a good old IFRAME element.
We can create an IFRAME element dynamically and access its content by:
Now the killer, to use an IFRAME, we don’t need it to point to an existing HTML document. If we don’t fill the SRC attribute with anything, then most browsers will create a document in the IFRAME and fill it with basic HTML structure (
<html><head/><body/></html>) automatically, and for those browsers that are not so cool (you know which ones) there is always a simple solution:
To sum things up, when we put the reader on a web page, the first thing it does is to search for every EMBED element with the appropriate content type set (application/epub+zip) and replace the element with readers GUI, which then will load and unzip the EPUB document and load the content into an IFRAME document.
There was a small problem with displaying images. There was no way to embed an image just by writing:
<img src=”zip:content/myImage.png”/>, every image was zipped and was available only as raw data. Fortunately, nowadays we can use Data URI, which is very well supported by various browsers (of course If we ignore IE8) (http://caniuse.com/#search=uri). Data URI allows us to display the image by embedding the raw image data as a base64 encoded string for example:
The Range Object
Because there are no defined pages in the document, we need to create them ourselves by slicing the content. To do that we need to iterate through all HTML elements and check where the content collides with our container (how much content we actually need to fill one page). This is done by creating a Range object and setting a start and an end node:
In a loop we modify the position of offsets, start and end nodes, also we will check if the height of the range is equal or greater that the height of our container. If yes, then we can clone the range by using the cloneContents() method. To get the next page, we need to set the start_node and its offset to the end_node and end_offset values, and repeat the process.
Things that still need to be done:
- Write tests, many tests.
- Better page slicing - the slicing methods still need some work. But they are pretty close to doing what they should be doing.
- Asynchronous pages slicing - because nobody likes when the UI freezes on large documents.
- Would be great if the reader remembered the current page between page reloads.
- Support for other browsers (for example IE9/10)