jidanni | 12 Dec 2010 02:37
Favicon
Gravatar

[emacs-w3m:11433] some pages chopped in w3m

$ set http://gmane.org/lists.php
$ lynx -dump $ <at>  | wc
  65996  188608 3338251
$ w3m  -dump $ <at>  | wc
     26     121    1617
Why does that occur?
w3m 0.5.2-10

Brian Keck | 12 Dec 2010 05:56
Picon

[w3m-dev-en 01141] Re: some pages chopped in w3m


On Sun, 12 Dec 2010 09:37:22 +0800, jidanni <at> jidanni.org wrote:
>$ set http://gmane.org/lists.php
>$ lynx -dump $ <at>  | wc
>  65996  188608 3338251
>$ w3m  -dump $ <at>  | wc
>     26     121    1617
>Why does that occur?
>w3m 0.5.2-10

It's mainly a single-row 3-column table whose 3rd column has a <ul> with
12,407 items.

If you delete the table related tags so you just have a 12,407 item <ul>, then
w3m shows the whole list.

If you delete the 1st & 2nd columns it still fails.  If in the single
(3rd) column version you delete all but the 1st 85% of items then it
works.  If you make this 86% then it fails.

So it seems w3m can't handle tables containing lists of more than x items,
where x is somewhere between 10554 & 10678.

Haven't looked at the source ...

Brian Keck

jidanni | 12 Dec 2010 12:50
Favicon
Gravatar

[emacs-w3m:11435] Re: [w3m-dev-en 01140] some pages chopped in w3m

>>>>> "BK" == Brian Keck <bwkeck <at> gmail.com> writes:...

BK> So it seems w3m can't handle tables containing lists of more than x items,
BK> where x is somewhere between 10554 & 10678.

Thanks for figuring it out. Let's hope they fix it.


Gmane