Steven McCoy | 31 Mar 03:23 2012
Picon

BSON as high performance serialisation

Interesting tidbit from a YouTube presentation,


Serialization formats - no matter which one you use, they are all expensive. Measure. Don’t use pickle. Not a good choice. Found protocol buffers slow. They wrote their own BSON implementation which is 10-15 time faster than the one you can download.


BSON is an initialism for Binary-JSON,


-- 
Steve-o
_______________________________________________
zeromq-dev mailing list
zeromq-dev <at> lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Marten Feldtmann | 31 Mar 09:57 2012
Picon

Re: BSON as high performance serialisation

What I find really interesting is the TNetString approach Mongrel2 is 
using instead of json.

Marten

Am 31.03.2012 03:23, schrieb Steven McCoy:
> Interesting tidbit from a YouTube presentation,
>
>     *Serialization formats* - no matter which one you use, they are all
Wolfgang Richter | 31 Mar 23:01 2012
Picon

Re: BSON as high performance serialisation

The only issue with BSON is that it's not entirely generic---the spec
has types that are specific to MongoDB (like DBPointer, stuff to ship
JavaScript code with context, etc.).

This comes from my experience implementing BSON in C.  And, yes, I'd
believe it could be fast (maybe not with string-based keys though?).

I feel like the specification could be lighter, a "light BSON" would
be ideal for a lot of applications I think.

In fact, a "light BSON" paired with 0MQ is basically what I'm
currently working on in a project :-)

--
Wolf

PS If anyone would like my C BSON implementation, I've been thinking
about releasing it under the MIT License, feel free to ask me.

On Sat, Mar 31, 2012 at 3:57 AM, Marten Feldtmann
<itlists <at> schrievkrom.de> wrote:
> What I find really interesting is the TNetString approach Mongrel2 is
> using instead of json.
>
> Marten
>
> Am 31.03.2012 03:23, schrieb Steven McCoy:
>> Interesting tidbit from a YouTube presentation,
>>
>>     *Serialization formats* - no matter which one you use, they are all
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev <at> lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Rick Olson | 31 Mar 23:33 2012
Picon

Re: BSON as high performance serialisation

How's BSON compare to msgpack?  I've started using that in places.
Wolfgang Richter | 1 Apr 00:27 2012
Picon

Re: BSON as high performance serialisation

On Sat, Mar 31, 2012 at 5:33 PM, Rick Olson <technoweenie <at> gmail.com> wrote:
> How's BSON compare to msgpack?  I've started using that in places.

In my mind, it seems like the performance of BSON and msgpack could be
comparable.

msgpack's specification is more generic than BSON's (no
MongoDB-specifics), and it seems to be a bit more well specified.  In
addition, msgpack doesn't require a string 'key' per message, and it's
format seems to be more compact (space-efficient) than BSON.  This
might imply quicker encoding/decoding, although that could also be
implementation-specific.

msgpack looks really nice :-)

Both seem simple enough to implement on your own (no external
dependencies introduced which can be nice).

However, msgpack seems nice because an ecosystem of software including
RPC is built around it (although reinventing the communication layers
which could be managed by 0MQ...).

BSON's canonical implementation is the one included in MongoDB.

I'd expect things like msgpack to "win" in the long run (unless the
BSON spec is changed to be simpler+more generic; it's already simpler
than msgpack though) because they are more generic.

> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev <at> lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Lourens Naudé | 1 Apr 00:51 2012

Re: BSON as high performance serialisation

I think it's more difficult to draw comparisons when one factors in the binding ecosystem as well - a large part of the community use libzmq through some higher level binding. Most serialization wrappers tend to create additional heap cruft that stresses the GC in some languages. Here's an interesting case study :

* Deets : http://www.ohler.com/software/thoughts/Blog/Entries/2012/3/13_Need_for_Speed.html
* Implementation : https://github.com/ohler55/oj


So in summary, watch out for edges where micro benches of an implementation is fast, yet introduce a "hidden" cost in GC pressure relative to message volume which can destroy any soft realtime guarantees and overall system throughput / performance.

I also think the topic should perhaps be taken off the list since libzmq does not impose message structure, BUT it's also important to keep tabs on this ( recommendations or real production feedback etc. ) somewhere on the wiki or docs for reference.

- Lourens

On Sat, Mar 31, 2012 at 11:27 PM, Wolfgang Richter <wolf <at> cs.cmu.edu> wrote:
On Sat, Mar 31, 2012 at 5:33 PM, Rick Olson <technoweenie <at> gmail.com> wrote:
> How's BSON compare to msgpack?  I've started using that in places.

In my mind, it seems like the performance of BSON and msgpack could be
comparable.

msgpack's specification is more generic than BSON's (no
MongoDB-specifics), and it seems to be a bit more well specified.  In
addition, msgpack doesn't require a string 'key' per message, and it's
format seems to be more compact (space-efficient) than BSON.  This
might imply quicker encoding/decoding, although that could also be
implementation-specific.

msgpack looks really nice :-)

Both seem simple enough to implement on your own (no external
dependencies introduced which can be nice).

However, msgpack seems nice because an ecosystem of software including
RPC is built around it (although reinventing the communication layers
which could be managed by 0MQ...).

BSON's canonical implementation is the one included in MongoDB.

I'd expect things like msgpack to "win" in the long run (unless the
BSON spec is changed to be simpler+more generic; it's already simpler
than msgpack though) because they are more generic.

> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev <at> lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
_______________________________________________
zeromq-dev mailing list
zeromq-dev <at> lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

_______________________________________________
zeromq-dev mailing list
zeromq-dev <at> lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Wolfgang Richter | 1 Apr 00:55 2012
Picon

Re: BSON as high performance serialisation

> I also think the topic should perhaps be taken off the list since libzmq
> does not impose message structure, BUT it's also important to keep tabs on
> this ( recommendations or real production feedback etc. ) somewhere on the
> wiki or docs for reference.
>

True, although every now and then questions crop up regarding how to
send datastructures via 0MQ across languages etc.

I think the documentation references ProtoBufs (FAQ does:
http://www.zeromq.org/area:faq), maybe we should add a list of
alternatives for people to look at (this is at least interesting to
some in the community)?

If 0MQ is trying its utmost to be performant in pushing messages, it
would be nice to be paired with a performant
serialization/deserialization solution.

YouTube reports that ProtoBufs is not so performant, which I guess
started this thread (and maybe many people pair ProtoBufs with
0MQ...).

--
Wolf
Wolfgang Richter | 1 Apr 01:35 2012
Picon

Re: BSON as high performance serialisation

Updated the FAQ to reflect the fact that choice of serialization
format/library isn't simple, and there are multiple solutions.

If you want to see the diff/what I added (feel free to add more),
check the history and compare revisions 124 and 125:

http://www.zeromq.org/area:faq

--
Wolf

On Sat, Mar 31, 2012 at 6:55 PM, Wolfgang Richter <wolf <at> cs.cmu.edu> wrote:
>> I also think the topic should perhaps be taken off the list since libzmq
>> does not impose message structure, BUT it's also important to keep tabs on
>> this ( recommendations or real production feedback etc. ) somewhere on the
>> wiki or docs for reference.
>>
>
>
> True, although every now and then questions crop up regarding how to
> send datastructures via 0MQ across languages etc.
>
> I think the documentation references ProtoBufs (FAQ does:
> http://www.zeromq.org/area:faq), maybe we should add a list of
> alternatives for people to look at (this is at least interesting to
> some in the community)?
>
> If 0MQ is trying its utmost to be performant in pushing messages, it
> would be nice to be paired with a performant
> serialization/deserialization solution.
>
> YouTube reports that ProtoBufs is not so performant, which I guess
> started this thread (and maybe many people pair ProtoBufs with
> 0MQ...).
>
> --
> Wolf
Pieter Hintjens | 1 Apr 01:42 2012

Re: BSON as high performance serialisation

On Sat, Mar 31, 2012 at 6:35 PM, Wolfgang Richter <wolf <at> cs.cmu.edu> wrote:

>> I think the documentation references ProtoBufs (FAQ does:
>> http://www.zeromq.org/area:faq), maybe we should add a list of
>> alternatives for people to look at (this is at least interesting to
>> some in the community)?

Yes, it's a common question, good to collect useful answers.

-Pieter
Justin Karneges | 1 Apr 05:37 2012

Re: BSON as high performance serialisation

ASN.1/BER/DER?

*ducks*

On Saturday, March 31, 2012 04:42:58 PM Pieter Hintjens wrote:
> On Sat, Mar 31, 2012 at 6:35 PM, Wolfgang Richter <wolf <at> cs.cmu.edu> wrote:
> >> I think the documentation references ProtoBufs (FAQ does:
> >> http://www.zeromq.org/area:faq), maybe we should add a list of
> >> alternatives for people to look at (this is at least interesting to
> >> some in the community)?
> 
> Yes, it's a common question, good to collect useful answers.
> 
> -Pieter
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev <at> lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Steven McCoy | 1 Apr 00:40 2012
Picon

Re: BSON as high performance serialisation

On 31 March 2012 17:33, Rick Olson <technoweenie <at> gmail.com> wrote:

How's BSON compare to msgpack?  I've started using that in places.

Not to dissuade from MsgPack having a more convenient API to use, but MsgPack is surprisingly worse than Protocol Buffers.  Despite their website claims, unfortunately the MsgPack projects testing procedure is flawed.  This has previously been raised on the list.

After looking at http://bsonspec.org/#/specification I'm not sure how YouTube is finding BSON to be faster.  It looks like a 1st generation format like TIBCO's forms and not second generation qforms (using a dictionary), or third generation rforms (using dynamic dictionaries).

More development appears to be towards convenience as hardware improvements make bit tweaking less productive.

-- 
Steve-o
_______________________________________________
zeromq-dev mailing list
zeromq-dev <at> lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Wolfgang Richter | 1 Apr 00:52 2012
Picon

Re: BSON as high performance serialisation

On Sat, Mar 31, 2012 at 6:40 PM, Steven McCoy <steven.mccoy <at> miru.hk> wrote:
> On 31 March 2012 17:33, Rick Olson <technoweenie <at> gmail.com> wrote:
>>
>> How's BSON compare to msgpack?  I've started using that in places.
>
>
> Not to dissuade from MsgPack having a more convenient API to use,
> but MsgPack is surprisingly worse than Protocol Buffers.  Despite their
> website claims, unfortunately the MsgPack projects testing procedure is
> flawed.  This has previously been raised on the list.

Right, although this is implementation-specific.

>
> After looking at http://bsonspec.org/#/specification I'm not sure how
> YouTube is finding BSON to be faster.  It looks like a 1st generation format
> like TIBCO's forms and not second generation qforms (using a dictionary), or
> third generation rforms (using dynamic dictionaries).

I agree, with string-based keys and space-inefficiency I'm wondering a bit too.

However, I think the key is in their "custom" BSON implementation.

I highly doubt YouTube uses vanilla BSON.

I think it is more likely that, just as BSON was "inspired" by JSON
(but not just a simple extension), I think YouTube is using something
"inspired" by BSON, but optimized for their use cases, and implemented
accordingly.

Note they claim "implementation which is 10-15 time faster than the
one you can download," which to me means they have a custom
_implementation_ of BSON which might differ from spec, that is faster
than the MongoDB canonical implementation.

Gmane