Pedro Ferreira | 10 Feb 10:49
Picon
Picon

Server-side caching

Hello all,

A (possibly silly) question: does ZEO have some kind of server-side 
cache? I mean, each time an oid is requested by one of the clients is it 
retrieved from the DB file directly, or are some of the objects kept in 
memory? From what I see in the code, the latter doesn't seem to happen.

I know there are client-side caches, but in a multiple client/server 
context I wonder if it's not faster to ask the DB for an oid that is 
already in memory instead of retrieving it from the client cache?

Thanks in advance,

Pedro

--

-- 
José Pedro Ferreira

Software Developer, Indico Project
http://indico-software.org

+-----------+
+  '``'--- `+  CERN - European Organization for Nuclear Research
+ |CERN|  / +  1211 Geneve 23, Switzerland
+ ..__. \.  +  IT-CIS-AVC
+  \\___.\  +  Office: 513-1-005
+      /    +  Tel. +41227677159
+-----------+
_______________________________________________
For more information about ZODB, see http://zodb.org/
(Continue reading)

Jim Fulton | 10 Feb 12:42
Favicon
Gravatar

Re: Server-side caching

On Fri, Feb 10, 2012 at 4:49 AM, Pedro Ferreira
<jose.pedro.ferreira <at> cern.ch> wrote:
> Hello all,
>
> A (possibly silly) question: does ZEO have some kind of server-side cache? I
> mean, each time an oid is requested by one of the clients is it retrieved
> from the DB file directly, or are some of the objects kept in memory? From
> what I see in the code, the latter doesn't seem to happen.

No -- and yes. :)

The OS' file-system cache acts as a storage server cache.  The storage
server does (essentially) no processing to data read from disk, so an
application-level cache would add nothing over the disk cache provided by
the storage server.

Also note that, for better or worse, FileStorage uses an in-memory index
of current record positions, so no disk access is needed to find current data.

> I know there are client-side caches, but in a multiple client/server context
> I wonder if it's not faster to ask the DB for an oid that is already in
> memory instead of retrieving it from the client cache?

In general, I'd say no.  It can depend on lots of details, including:

- database size
- active set size
- network speed
- memory and disk speeds on clients and servers
- ...
(Continue reading)

Pedro Ferreira | 13 Feb 11:06
Picon
Picon

Re: Server-side caching

Dear Jim,

Thanks for your answer.

> The OS' file-system cache acts as a storage server cache.  The storage
> server does (essentially) no processing to data read from disk, so an
> application-level cache would add nothing over the disk cache provided by
> the storage server.

I see, then I guess it would be good to have at least the same amount of 
RAM as the total size of the DB, no? From what I see in our server, the 
linux buffer cache takes around 13GB of the 16G available, while the 
rest is mostly taken by the ZEO process (1.7G). The database is 17GB on 
disk.

> Also note that, for better or worse, FileStorage uses an in-memory index
> of current record positions, so no disk access is needed to find current data.

Yes, but pickles still have to be retrieved, right? I guess this would 
mean random access (for a database like ours, in which we have many 
small objects), which doesn't favor cache performance.

I'm asking this because in the tests we've made wih SSDs we have seen a 
20% decrease in reading time for non-client-cached objects. So, there 
seems to be some disk i/o going on.

> In general, I'd say no.  It can depend on lots of details, including:
>
> - database size
> - active set size
(Continue reading)

Laurence Rowe | 13 Feb 13:19
Picon
Gravatar

Re: Server-side caching

On 13 February 2012 10:06, Pedro Ferreira <jose.pedro.ferreira <at> cern.ch> wrote:
>> The OS' file-system cache acts as a storage server cache.  The storage
>> server does (essentially) no processing to data read from disk, so an
>> application-level cache would add nothing over the disk cache provided by
>> the storage server.
>
>
> I see, then I guess it would be good to have at least the same amount of RAM
> as the total size of the DB, no? From what I see in our server, the linux
> buffer cache takes around 13GB of the 16G available, while the rest is
> mostly taken by the ZEO process (1.7G). The database is 17GB on disk.

Adding enough memory so the database fits in RAM is always a good idea.

Since the introduction of blobs, this should be possible (and
relatively cheap) for most ZODB deployments. For Plone sites, a 30GB
pre-blobs Data.fs typically falls to 2-3GB with blobs.

There's also the wrapper storage zc.zlibstorage which compresses ZODB
records allowing more of the database to fit in RAM (RelStorage has an
option to compress records.)

>> Also note that, for better or worse, FileStorage uses an in-memory index
>> of current record positions, so no disk access is needed to find current
>> data.
>
>
> Yes, but pickles still have to be retrieved, right? I guess this would mean
> random access (for a database like ours, in which we have many small
> objects), which doesn't favor cache performance.
(Continue reading)

Pedro Ferreira | 13 Feb 13:39
Picon
Picon

Re: Server-side caching

Hello,

Thanks a lot for your suggestions.

> You could try a ZEO fanout setup too, where you have a  ZEO server
> running on each client machine. The intermediary ZEO's client cache
> (you could put it on tmpfs if you have enough RAM) is then shared
> between all the clients running on that machine.

Like this?

http://svn.zope.org/ZODB/trunk/src/ZEO/tests/zeo-fan-out.test?rev=81822&view=markup

That looks like an interesting possibility. Is anyone using it in 
production? Is it stable/reliable? Any drawbacks besides the overhead 
that gets added by the additional ClientStorage?

Thanks a lot, once again.

Pedro

--

-- 
José Pedro Ferreira

Software Developer, Indico Project
http://indico-software.org

+-----------+
+  '``'--- `+  CERN - European Organization for Nuclear Research
+ |CERN|  / +  1211 Geneve 23, Switzerland
(Continue reading)

Laurence Rowe | 13 Feb 14:01
Picon
Gravatar

Re: Server-side caching

On 13 February 2012 12:39, Pedro Ferreira <jose.pedro.ferreira <at> cern.ch> wrote:
> Hello,
>
> Thanks a lot for your suggestions.
>
>
>> You could try a ZEO fanout setup too, where you have a  ZEO server
>> running on each client machine. The intermediary ZEO's client cache
>> (you could put it on tmpfs if you have enough RAM) is then shared
>> between all the clients running on that machine.
>
>
> Like this?
>
> http://svn.zope.org/ZODB/trunk/src/ZEO/tests/zeo-fan-out.test?rev=81822&view=markup
>
> That looks like an interesting possibility. Is anyone using it in
> production? Is it stable/reliable? Any drawbacks besides the overhead that
> gets added by the additional ClientStorage?

Yes, that seems to be the best docs I could find too (not that there's
much to document.) I've not tried it myself. I think people are using
it in production, but I don't remember who.

Laurence
_______________________________________________
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev <at> zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev
(Continue reading)

Pedro Ferreira | 13 Feb 13:31
Picon
Picon

Re: Server-side caching


> between processes, which doesn't make them very useful , in which we

very useful *for our setup*

-- 
José Pedro Ferreira

Software Developer, Indico Project
http://indico-software.org

+-----------+
+  '``'--- `+  CERN - European Organization for Nuclear Research
+ |CERN|  / +  1211 Geneve 23, Switzerland
+ ..__. \.  +  IT-CIS-AVC
+  \\___.\  +  Office: 513-1-005
+      /    +  Tel. +41227677159
+-----------+
_______________________________________________
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev <at> zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev

Jim Fulton | 13 Feb 15:49
Favicon
Gravatar

Re: Server-side caching

On Mon, Feb 13, 2012 at 5:06 AM, Pedro Ferreira
<jose.pedro.ferreira <at> cern.ch> wrote:
> Dear Jim,
>
> Thanks for your answer.
>
>
>> The OS' file-system cache acts as a storage server cache.  The storage
>> server does (essentially) no processing to data read from disk, so an
>> application-level cache would add nothing over the disk cache provided by
>> the storage server.
>
>
> I see, then I guess it would be good to have at least the same amount of RAM
> as the total size of the DB, no? From what I see in our server, the linux
> buffer cache takes around 13GB of the 16G available, while the rest is
> mostly taken by the ZEO process (1.7G). The database is 17GB on disk.

Having enough ram to hold your entire database may not be practical.
Ideally, you want enough to hold the working set.  For many applications,
most of the database reads are from the later part of the file.  The working
set is often much smaller than the whole file.

>
>
>> Also note that, for better or worse, FileStorage uses an in-memory index
>> of current record positions, so no disk access is needed to find current
>> data.
>
>
(Continue reading)

Pedro Ferreira | 13 Feb 19:01
Picon
Picon

Re: Server-side caching

> Having enough ram to hold your entire database may not be practical.
> Ideally, you want enough to hold the working set.  For many applications,
> most of the database reads are from the later part of the file.  The working
> set is often much smaller than the whole file.

That is a very good point. I will try to find that out, maybe I can take 
a FileStorage index file and calculate the distribution.

>> I guess this would mean
>> random access (for a database like ours, in which we have many small
>> objects), which doesn't favor cache performance.
>
> I don't see how this follows.

I meant that if we have to retrieve different small pickles from disk, 
this will result in continuous access to random disk locations, which 
can be bad (depending on the granularity of the cache). However, 
considering what you've said above (that the working set should be 
located at the later part of the file), maybe that's not the case.

> The caches are still probably providing benefit, depending on how large they
> are.  If you haven't, you should probably try using the ZEO cache-analysis
> scripts to get a better handle on how effective our cache is and whether it
> should be larger.

Will do so.

> I imagine that someone will eventually figure out how to use
> memcached to implement a shared ZEO cache, as has been done
> for relstorage.
(Continue reading)


Gmane