Pablo Rodríguez | 23 Apr 22:38 2012
Picon

TEI generated ePubs slower to navigate

Hi there,

I got an e-reader and one of the books I wanted to read was the TEI Lite
Guidelines (generated from
http://www.tei-c.org/release/xml/tei/custom/odd/teilite.odd at
http://www.tei-c.org/ege-webclient/).

Everything works fine, except that the TEI Lite ePub file takes longer
to navigate in the e-reader. This is specially noticeable when you go to
the next page beginning a new section. In this case it takes more than
10 seconds, when going to the next page only takes about a second in
other files.

I experienced this issue for the first time with the Guidelines.epub. I
thought it was caused by the big size of the archive. But after I
experienced the same issue with teilite.epub, I thought it might be the
way ePub files are generated from TEI files.

Comparing two files: teilite.epub and another bigger file that works
fine, gives the following results:

teilite.epub contains 35 files. Compressed size is 314.2KB and
uncompressed size is 2.8MB, being the compression ratio almost 11.06%.

http://www.feedbooks.com/book/4960.epub contains 128 files. Compressed
size is 2.3MB and uncompressed size is 6.7, being the compression rate 35%.

The higher compression rate might be the culprit, although I experience
the problem also with other files than have a much lower compression
rate
(Continue reading)

stuart yeates | 24 Apr 07:42 2012
Picon

Re: TEI generated ePubs slower to navigate

There are several issues that may be intertwined

(1) higher compression ratios are likely to be slower
(2) larger content file sizes (longer textual chapters, larger images) 
are likely to be slower
(3) more complex HTML (unnecessary <div/>s etc) is likely to be slower
(4) more complex CSS / javascript is likely to be slower

These can be checked by:

(1) Uncompressing the epub (if necessary chang .epub to .zip so standard 
zip tools will work) and recompessing it twice, once with the -9 option 
(maximum compression) once with -0 option (no compression)

(2) Uncompressing the epub, scaling down the images and recompressing 
(probably need to recompress the original)

(3) Uncompressing, running through jtidy (or similar) and recompressing.

(4) Uncompressing, stripping out the files and recompressing.

cheers
stuart

On 24/04/12 08:38, Pablo Rodríguez wrote:
> Hi there,
>
> I got an e-reader and one of the books I wanted to read was the TEI Lite
> Guidelines (generated from
> http://www.tei-c.org/release/xml/tei/custom/odd/teilite.odd at
(Continue reading)

Sebastian Rahtz | 24 Apr 09:47 2012
Picon
Picon

Re: TEI generated ePubs slower to navigate

On 24 Apr 2012, at 06:42, stuart yeates wrote:

> There are several issues that may be intertwined
> 
> (1) higher compression ratios are likely to be slower
> (2) larger content file sizes (longer textual chapters, larger images) are likely to be slower
> (3) more complex HTML (unnecessary <div/>s etc) is likely to be slower
> (4) more complex CSS / javascript is likely to be slower
> 
I would guess the difference must be (3) or (4), but this
is going to be very hard to pin down.  It may also
depend on which  epub reader is being used.

I am afraid I don't see the problem myself, so don't
really know what to advise.

--
Sebastian Rahtz

stuart yeates | 24 Apr 09:53 2012
Picon

Re: TEI generated ePubs slower to navigate

On 24/04/12 19:47, Sebastian Rahtz wrote:
>
> On 24 Apr 2012, at 06:42, stuart yeates wrote:
>
>> There are several issues that may be intertwined
>>
>> (1) higher compression ratios are likely to be slower
>> (2) larger content file sizes (longer textual chapters, larger images) are likely to be slower
>> (3) more complex HTML (unnecessary<div/>s etc) is likely to be slower
>> (4) more complex CSS / javascript is likely to be slower
>>
> I would guess the difference must be (3) or (4), but this
> is going to be very hard to pin down.  It may also
> depend on which  epub reader is being used.
>
> I am afraid I don't see the problem myself, so don't
> really know what to advise.

I've just recalled that there's also:

(5) your zip file is compressed using bzip2 rather than LZ compression

bzip2 gives better compression on large and very large files. Never use 
bzip2 or BWT based algorithm where the file many need accessed 
incrementally, however, since it typically works in 900k blocks rather 
than 1k blocks.

cheers
stuart

(Continue reading)

Sebastian Rahtz | 24 Apr 19:47 2012
Picon
Picon

Re: TEI generated ePubs slower to navigate

On 24 Apr 2012, at 08:53, stuart yeates wrote:


I've just recalled that there's also:

(5) your zip file is compressed using bzip2 rather than LZ compression

bzip2 gives better compression on large and very large files. Never use bzip2 or BWT based algorithm where the file many need accessed incrementally, however, since it typically works in 900k blocks rather than 1k blocks.


interesting. I am doing this stuff from within ant (using http://ant.apache.org/manual/Tasks/zip.html), and
it looks like that gives no control over the compression algorithm. I wonder what to do?

Sebastian
Pablo Rodríguez | 24 Apr 19:29 2012
Picon

Re: TEI generated ePubs slower to navigate

On 24/04/12 09:47, Sebastian Rahtz wrote:
> On 24 Apr 2012, at 06:42, stuart yeates wrote:
> 
>> There are several issues that may be intertwined
>>
>> (1) higher compression ratios are likely to be slower
>> (2) larger content file sizes (longer textual chapters, larger images) are likely to be slower
>> (3) more complex HTML (unnecessary <div/>s etc) is likely to be slower
>> (4) more complex CSS / javascript is likely to be slower
>>
> I would guess the difference must be (3) or (4), but this
> is going to be very hard to pin down.  It may also
> depend on which  epub reader is being used.

Many thanks for your help, Stuart and Sebastian.

Removing both print.css and stylesheet.css from teilite.epub (and
removing references from content.opf makes everything work as any other
file.

BTW, I have found that both css files include at the beginning:

/*
** Copyright 2008 TEI Consortium

$Id: tei-print.css 6500 2009-06-05 12:24:45Z rahtz $

This library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.
 This library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
Lesser General Public License for more details.
 You should have received a copy of the GNU Lesser General Public
License along with this library; if not, write to the
Free Software Foundation, Inc.,
59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

*/

I guess this is old information, isn't it?

Many thanks for your help again,

Pablo
--

-- 
http://www.ousia.tk

Sebastian Rahtz | 24 Apr 19:54 2012
Picon
Picon

Re: TEI generated ePubs slower to navigate

On 24 Apr 2012, at 18:29, Pablo Rodríguez wrote:

> 
> Removing both print.css and stylesheet.css from teilite.epub (and
> removing references from content.opf makes everything work as any other
> file.

this does not seem an entirely sustainable solution, deleting
the stylesheet :-}

which reader software do you use?
> 
> BTW, I have found that both css files include at the beginning:
> 
> /*
> ** Copyright 2008 TEI Consortium

I'll fix those in a moment

Sebastian

Pablo Rodríguez | 24 Apr 20:11 2012
Picon

Re: TEI generated ePubs slower to navigate

On 24/04/12 19:54, Sebastian Rahtz wrote:
> On 24 Apr 2012, at 18:29, Pablo Rodríguez wrote:
>>
>> Removing both print.css and stylesheet.css from teilite.epub (and
>> removing references from content.opf makes everything work as any other
>> file.
> 
> this does not seem an entirely sustainable solution, deleting
> the stylesheet :-}
> 
> which reader software do you use?

Well, I'm afraid that I don't know. It's the one that comes with the
device (Wolder miBuk γ 6.2).

I don't even know whether ePub is read using Adobe Reader or the
libraries that come with the Linux OS flashed in ROM memory (this is
much worse than Greek to me ;-)). Sorry for not being very informative.

>> BTW, I have found that both css files include at the beginning:
>> /*
>> ** Copyright 2008 TEI Consortium				
> 
> I'll fix those in a moment

Thanks again for your help,

Pablo
--

-- 
http://www.ousia.tk

stuart yeates | 24 Apr 21:09 2012
Picon

Re: TEI generated ePubs slower to navigate

On 25/04/12 05:29, Pablo Rodríguez wrote:
> Removing both print.css and stylesheet.css from teilite.epub (and
> removing references from content.opf makes everything work as any other
> file.

I suggest that you try remove different types of attributes to see which 
particular ones are causing the trouble. It may be that named fonts are 
slow on this reader, or floats, or padding, or something else.

On the NZETC site we strip out lots of CSS attributes when we package it 
into ePubs. Compare for example http://www.nzetc.org/tm/main.css with 
the same file taken from inside one of our ePubs. I don't recall the 
details of exactly why I remove different attributes, sorry.

cheers
stuart

Pablo Rodríguez | 24 Apr 22:16 2012
Picon

Re: TEI generated ePubs slower to navigate

On 24/04/12 21:09, stuart yeates wrote:
> On 25/04/12 05:29, Pablo Rodríguez wrote:
>> Removing both print.css and stylesheet.css from teilite.epub (and
>> removing references from content.opf makes everything work as any other
>> file.
> 
> I suggest that you try remove different types of attributes to see which 
> particular ones are causing the trouble. It may be that named fonts are 
> slow on this reader, or floats, or padding, or something else.

Thanks for the tip, Stuart.

I hope to do this at the weekend, since the whole testing requires time.

> On the NZETC site we strip out lots of CSS attributes when we package it 
> into ePubs. Compare for example http://www.nzetc.org/tm/main.css with 
> the same file taken from inside one of our ePubs. I don't recall the 
> details of exactly why I remove different attributes, sorry.

I tried to compare both files with meld (which is a visual diff), but
this won't work that easy, I must analyze the differences myself.

Many thanks for your help,

Pablo
--

-- 
http://www.ousia.tk

stuart yeates | 24 Apr 22:24 2012
Picon

Re: TEI generated ePubs slower to navigate

On 25/04/12 08:16, Pablo Rodríguez wrote:

>> On the NZETC site we strip out lots of CSS attributes when we package it
>> into ePubs. Compare for example http://www.nzetc.org/tm/main.css with
>> the same file taken from inside one of our ePubs. I don't recall the
>> details of exactly why I remove different attributes, sorry.
>
> I tried to compare both files with meld (which is a visual diff), but
> this won't work that easy, I must analyze the differences myself.

It looks like the ones I'm removing are:

background:
border-bottom:
border-color:
border-style:
border-top:
border-width:
clear:
color:
display:
float:
font-family:
font-size:
font-style:
font-weight:
height:
left:
line-height:
list-style-type:
margin:
margin-bottom:
margin-left:
margin-right:
margin-top:
max-height:
max-width:
min-height:
min-width:
padding:
padding-bottom:
padding-left:
padding-right:
padding-top:
position:
text-align:
text-indent:
text-transform:
width:

cheers
stuart

Sebastian Rahtz | 24 Apr 23:12 2012
Picon
Picon

Re: TEI generated ePubs slower to navigate


On 24 Apr 2012, at 21:24, stuart yeates wrote:

It looks like the ones I'm removing are:


and I see that I remove line-height, max-width and  height
(as part of tei to epub, I parse and filter the CSS).

I used to get rid of more, but  I think your list is exceedingly
Draconian. Decent readers (i.e. iBooks) cope properly
with most of those.

Sebastian

stuart yeates | 25 Apr 00:52 2012
Picon

Re: TEI generated ePubs slower to navigate

On 25/04/12 09:12, Sebastian Rahtz wrote:
>
> On 24 Apr 2012, at 21:24, stuart yeates wrote:
>>
>> It looks like the ones I'm removing are:
>>
> …
>
> and I see that I remove line-height, max-width and height
> (as part of tei to epub, I parse and filter the CSS).
>
> I used to get rid of more, but I think your list is exceedingly
> Draconian. Decent readers (i.e. iBooks) cope properly
> with most of those.

I'm happy being Draconian.

My pipeline was built before iBooks was released and I was more 
concerned with maximum interoperability, both for social inclusion and 
because the library was clearly going to buy ebook readers to loan to 
students but had yet to pick which make / model. I didn't want to have 
to revisit my work after they'd made that call.

cheers
stuart

Pablo Rodríguez | 5 May 11:08 2012
Picon

Re: TEI generated ePubs slower to navigate

On 24/04/12 23:12, Sebastian Rahtz wrote:
> 
> On 24 Apr 2012, at 21:24, stuart yeates wrote:
>>
>> It looks like the ones I'm removing are:
>> [...]
> 
> and I see that I remove line-height, max-width and  height
> (as part of tei to epub, I parse and filter the CSS).
> 
> I used to get rid of more, but  I think your list is exceedingly
> Draconian. Decent readers (i.e. iBooks) cope properly
> with most of those.

Sorry for the late reply, Sebastian.

Only one comment about decent readers. iBooks is a software for the
iPad, if I got it right. This is a tablet and I was speaking of e-ink
readers, which have much less processing power than tablets.

Pablo
--

-- 
http://www.ousia.tk


Gmane