Iustin Pop | 4 Dec 15:11 2012
Picon

Is it possible to have constant-space JSON decoding?

Hi,

I'm trying to parse a rather simple but big JSON message, but it turns
out that memory consumption is a problem, and I'm not sure what the
actual cause is.

Let's say we have a simple JSON message: an array of 5 million numbers.
I would like to parse this in constant space, such that if I only need
the last element, overall memory usage is low (yes, unrealistic use, but
please bear with me for a moment).

Using aeson, I thought the following program will be nicely-behaved:

> import Data.Aeson
> import qualified Data.Attoparsec.ByteString.Lazy as AL
> import qualified Data.ByteString.Lazy as L
> 
> main = do
>   r <- L.readFile "numbers"
>   case AL.parse json r :: AL.Result Value of
>     AL.Fail _ context errs -> do
>          print context
>          print errs
>     AL.Done _ d -> case fromJSON d::Result [Value] of
>                      Error x -> putStrLn x
>                      Success d -> print $ last d

However, this uses (according to +RTS -s) ~1150 GB of memory. I've tried
switching from json to json', but while that uses slightly less memory
(~1020 MB) it clearly can't be fully lazy, since it forces conversion to
(Continue reading)

Felipe Almeida Lessa | 4 Dec 15:23 2012
Picon

Re: Is it possible to have constant-space JSON decoding?

Aeson doesn't have an incremental parser so it'll be
difficult/impossible to do what you want.  I guess you want an
event-based JSON parser, such as yajl [1].  I've never used this
library, though.

Cheers,

[1] http://hackage.haskell.org/package/yajl-0.3.0.5

On Tue, Dec 4, 2012 at 12:11 PM, Iustin Pop <iustin <at> google.com> wrote:
> Hi,
>
> I'm trying to parse a rather simple but big JSON message, but it turns
> out that memory consumption is a problem, and I'm not sure what the
> actual cause is.
>
> Let's say we have a simple JSON message: an array of 5 million numbers.
> I would like to parse this in constant space, such that if I only need
> the last element, overall memory usage is low (yes, unrealistic use, but
> please bear with me for a moment).
>
> Using aeson, I thought the following program will be nicely-behaved:
>
>> import Data.Aeson
>> import qualified Data.Attoparsec.ByteString.Lazy as AL
>> import qualified Data.ByteString.Lazy as L
>>
>> main = do
>>   r <- L.readFile "numbers"
>>   case AL.parse json r :: AL.Result Value of
(Continue reading)

Iustin Pop | 4 Dec 15:38 2012
Picon

Re: Is it possible to have constant-space JSON decoding?

On Tue, Dec 04, 2012 at 12:23:19PM -0200, Felipe Almeida Lessa wrote:
> Aeson doesn't have an incremental parser so it'll be
> difficult/impossible to do what you want.  I guess you want an
> event-based JSON parser, such as yajl [1].  I've never used this
> library, though.

Ah, I see. Thanks, I wasn't aware of that library.

So it seems that using either 'aeson' or 'json', we should be prepared
to pay the full cost of input message (string/bytestring) plus the cost
of the converted data structures.

thanks!
iustin
Clark Gaebel | 4 Dec 15:47 2012
Picon
Picon

Re: Is it possible to have constant-space JSON decoding?

Aeson is used for the very common usecase of short messages that need to be parsed as quickly as possible into a static structure. A lot of things are sacrificed to make this work, such as incremental parsing and good error messages. It works great for web APIs like twitter's.

I didn't even know people used JSON to store millions of integers. It sounds like fun.

  - Clark


On Tue, Dec 4, 2012 at 9:38 AM, Iustin Pop <iustin <at> google.com> wrote:
On Tue, Dec 04, 2012 at 12:23:19PM -0200, Felipe Almeida Lessa wrote:
> Aeson doesn't have an incremental parser so it'll be
> difficult/impossible to do what you want.  I guess you want an
> event-based JSON parser, such as yajl [1].  I've never used this
> library, though.

Ah, I see. Thanks, I wasn't aware of that library.

So it seems that using either 'aeson' or 'json', we should be prepared
to pay the full cost of input message (string/bytestring) plus the cost
of the converted data structures.

thanks!
iustin

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Iustin Pop | 4 Dec 15:58 2012
Picon

Re: Is it possible to have constant-space JSON decoding?

On Tue, Dec 04, 2012 at 09:47:52AM -0500, Clark Gaebel wrote:
> Aeson is used for the very common usecase of short messages that need to be
> parsed as quickly as possible into a static structure. A lot of things are
> sacrificed to make this work, such as incremental parsing and good error
> messages. It works great for web APIs like twitter's.

I see, good to know.

> I didn't even know people used JSON to store millions of integers. It
> sounds like fun.

Actually, that's not the actual use case :), this was just an example to
show memory use with trivial data structures (where strictness/lazyness
is easier to reason about).

In my case, I have reasonably-sized message of complex objects, which
results in the same memory profile: cost of input message (as
String/ByteString) plus cost of the converted objects. What bothers me
is that I don't seem to be able to at least remove the cost of the input
data after parsing, due to non-strict types I convert to.

thanks,
iustin
Herbert Valerio Riedel | 4 Dec 16:15 2012
Picon

Re: Is it possible to have constant-space JSON decoding?

Clark Gaebel <cgaebel <at> uwaterloo.ca> writes:

[...]

> I didn't even know people used JSON to store millions of integers. It
> sounds like fun.

Actually, JSON is quite convenient if you need a standardized common
interchange format between Python, Ruby, JS et al. based components as
it directly maps to a common subset of primitive data-structure
available in those languages (i.e. bool,strings,numbers,arrays,objects)
and also very efficient JSON decoders are available by now (maybe even
part of the respective standard library)

So I'm actually struggling myself to find a way to get large JSON text
parsed in Haskell with a comparable memory footprint to e.g. Python.

As for generating large JSON document, I've found in 'json-builder' a
more memory efficient alternative to 'aeson'.
Herbert Valerio Riedel | 4 Dec 15:58 2012
Picon

Re: Is it possible to have constant-space JSON decoding?

Iustin Pop <iustin <at> google.com> writes:

[...]

> Let's say we have a simple JSON message: an array of 5 million numbers.
> I would like to parse this in constant space, such that if I only need
> the last element, overall memory usage is low (yes, unrealistic use, but
> please bear with me for a moment).
>
> Using aeson, I thought the following program will be nicely-behaved:

part of the problem is that aeson builds an intermediate JSON parse-tree
which has quite an overhead for representing a list of numbers on the
heap as each numbers requires multiple heap objects (see also [1]). This
is an area where e.g. Python has a significantly smaller footprint
(mostly due to a more efficient heap representation).

[...]

> It seems that the Array constructor holds a vector, and this results in
> too much strictness?

btw, using a list on the other hand would add an overhead of 2 words
(due to the ':' constructor) for representing each JSON array element in
the parse-tree, that's probably why aeson uses vectors instead of lists.

[...]

cheers,
  hvr

 [1]: https://github.com/bos/aeson/issues/22
Iustin Pop | 4 Dec 16:23 2012
Picon

Re: Is it possible to have constant-space JSON decoding?

On Tue, Dec 04, 2012 at 03:58:24PM +0100, Herbert Valerio Riedel wrote:
> Iustin Pop <iustin <at> google.com> writes:
> 
> [...]
> 
> > Let's say we have a simple JSON message: an array of 5 million numbers.
> > I would like to parse this in constant space, such that if I only need
> > the last element, overall memory usage is low (yes, unrealistic use, but
> > please bear with me for a moment).
> >
> > Using aeson, I thought the following program will be nicely-behaved:
> 
> part of the problem is that aeson builds an intermediate JSON parse-tree
> which has quite an overhead for representing a list of numbers on the
> heap as each numbers requires multiple heap objects (see also [1]). This
> is an area where e.g. Python has a significantly smaller footprint
> (mostly due to a more efficient heap representation).

Ah, I see. Thanks for the link, so that's from where the 'S' constructor
was coming from in the -hd output.

And indeed, I was surprised as well that Python has a more efficient
representation for this.

> [...]
> 
> > It seems that the Array constructor holds a vector, and this results in
> > too much strictness?
> 
> btw, using a list on the other hand would add an overhead of 2 words
> (due to the ':' constructor) for representing each JSON array element in
> the parse-tree, that's probably why aeson uses vectors instead of lists.

Ack.

thanks,
iustin

Gmane