David Whiteman | 3 Apr 12:05 2012
Picon

GlusterFS performance with small files.

Hi,

I am currently looking into GlusterFS to use as a storage cluster for 
our email storage. I want to mount the storage from different servers 
(or VMs), services accessing the storage include exim, courier-imapd, 
courier-pop3d. Our emails are stored in MailDir format, which is many 
small files. I have read that GlusterFS doesn't perform very well with 
small files, is this still the case?

I would like to achieve similar (or better) performance to our current 
NFS setup, with the added redundancy that GlusterFS provides.

Is there any utilities I can use to test the performance?

Thanks in Advance
Bryan Whitehead | 3 Apr 18:40 2012
Picon

Re: GlusterFS performance with small files.

A bunch of small files is terrible performance. Really not much you
can do about that. Store each mailbox in a single file. MailDir format
is definitely going to suck.

On Tue, Apr 3, 2012 at 3:05 AM, David Whiteman <davew@...> wrote:
> Hi,
>
> I am currently looking into GlusterFS to use as a storage cluster for our
> email storage. I want to mount the storage from different servers (or VMs),
> services accessing the storage include exim, courier-imapd, courier-pop3d.
> Our emails are stored in MailDir format, which is many small files. I have
> read that GlusterFS doesn't perform very well with small files, is this
> still the case?
>
> I would like to achieve similar (or better) performance to our current NFS
> setup, with the added redundancy that GlusterFS provides.
>
> Is there any utilities I can use to test the performance?
>
> Thanks in Advance
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@...
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Haris Zukanovic | 3 Apr 18:55 2012
Picon

Re: GlusterFS performance with small files.

Is there anything to do to optimize the read for small files in a 
replicated gluster setup? The files reside allready on the server in 
question in the brick.
Something like disable diverse checking for files that I know are not 
updated often? For example web files like images uploaded through the 
CMS. These files are uploaded once and never modified again...

kind regards
Haris

On 03/04/12 18.40, Bryan Whitehead wrote:
> A bunch of small files is terrible performance. Really not much you
> can do about that. Store each mailbox in a single file. MailDir format
> is definitely going to suck.
>
> On Tue, Apr 3, 2012 at 3:05 AM, David Whiteman<davew@...>  wrote:
>> Hi,
>>
>> I am currently looking into GlusterFS to use as a storage cluster for our
>> email storage. I want to mount the storage from different servers (or VMs),
>> services accessing the storage include exim, courier-imapd, courier-pop3d.
>> Our emails are stored in MailDir format, which is many small files. I have
>> read that GlusterFS doesn't perform very well with small files, is this
>> still the case?
>>
>> I would like to achieve similar (or better) performance to our current NFS
>> setup, with the added redundancy that GlusterFS provides.
>>
>> Is there any utilities I can use to test the performance?
>>
(Continue reading)

Jeff White | 3 Apr 19:17 2012
Picon

Re: GlusterFS performance with small files.

Using Gluster's NFS instead of the native (FUSE) client can get you 
better performance with many small files but you lose functionality by 
doing that.

There's lots of variance in all of this so the best thing you can do is 
test and benchmark on your own datasets and systems.

Jeff White - Linux/Unix Systems Engineer
University of Pittsburgh - CSSD

On 04/03/2012 12:55 PM, Haris Zukanovic wrote:
> Is there anything to do to optimize the read for small files in a
> replicated gluster setup? The files reside allready on the server in
> question in the brick.
> Something like disable diverse checking for files that I know are not
> updated often? For example web files like images uploaded through the
> CMS. These files are uploaded once and never modified again...
>
> kind regards
> Haris
>
> On 03/04/12 18.40, Bryan Whitehead wrote:
>> A bunch of small files is terrible performance. Really not much you
>> can do about that. Store each mailbox in a single file. MailDir format
>> is definitely going to suck.
>>
>> On Tue, Apr 3, 2012 at 3:05 AM, David Whiteman<davew@...>   wrote:
>>> Hi,
>>>
>>> I am currently looking into GlusterFS to use as a storage cluster for our
(Continue reading)

Vladislav Tchernev | 4 Apr 15:35 2012

Re: GlusterFS performance with small files.

A nice solution of using NFS while preserving FUSE client redundancy benefit.
http://community.gluster.org/a/nfs-performance-with-fuse-client-redundancy/

Cheers
Vlad

On Tue, Apr 3, 2012 at 1:17 PM, Jeff White <jaw171-fYq5UfK3d1k@public.gmane.org> wrote:
Using Gluster's NFS instead of the native (FUSE) client can get you better performance with many small files but you lose functionality by doing that.

There's lots of variance in all of this so the best thing you can do is test and benchmark on your own datasets and systems.

Jeff White - Linux/Unix Systems Engineer
University of Pittsburgh - CSSD


On 04/03/2012 12:55 PM, Haris Zukanovic wrote:
Is there anything to do to optimize the read for small files in a
replicated gluster setup? The files reside allready on the server in
question in the brick.
Something like disable diverse checking for files that I know are not
updated often? For example web files like images uploaded through the
CMS. These files are uploaded once and never modified again...

kind regards
Haris

On 03/04/12 18.40, Bryan Whitehead wrote:
A bunch of small files is terrible performance. Really not much you
can do about that. Store each mailbox in a single file. MailDir format
is definitely going to suck.

On Tue, Apr 3, 2012 at 3:05 AM, David Whiteman<davew-Ku7NbOBBH+crt0uItJCqpA@public.gmane.org>   wrote:
Hi,

I am currently looking into GlusterFS to use as a storage cluster for our
email storage. I want to mount the storage from different servers (or VMs),
services accessing the storage include exim, courier-imapd, courier-pop3d.
Our emails are stored in MailDir format, which is many small files. I have
read that GlusterFS doesn't perform very well with small files, is this
still the case?

I would like to achieve similar (or better) performance to our current NFS
setup, with the added redundancy that GlusterFS provides.

Is there any utilities I can use to test the performance?

Thanks in Advance
_______________________________________________
Gluster-users mailing list
Gluster-users <at> gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users <at> gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users <at> gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users



--
----
Vladislav Tchernev
Senior System Administrator
Broadsign INT
+1 (514) 399-1184
www.broadsign.com

_______________________________________________
Gluster-users mailing list
Gluster-users@...
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
David Whiteman | 4 Apr 13:15 2012
Picon

Re: GlusterFS performance with small files.

Hi,

Thanks for the reply. Changing to mbox is not really an option, we are 
stuck with MailDir format.

All current cluster filesystems I've read into seem to have problems 
with small files.

I guess the only alternative seems to be a DRBD setup, but this would 
limit me to 2 nodes only and was the reason I was looking into GlusterFS.

Anyone know of any alternatives to GlusterFS that offer similar 
performance (with very small files) to NFS?

Thanks

On 03/04/12 17:40, Bryan Whitehead wrote:
> A bunch of small files is terrible performance. Really not much you
> can do about that. Store each mailbox in a single file. MailDir format
> is definitely going to suck.
>
> On Tue, Apr 3, 2012 at 3:05 AM, David Whiteman<davew@...>  wrote:
>> Hi,
>>
>> I am currently looking into GlusterFS to use as a storage cluster for our
>> email storage. I want to mount the storage from different servers (or VMs),
>> services accessing the storage include exim, courier-imapd, courier-pop3d.
>> Our emails are stored in MailDir format, which is many small files. I have
>> read that GlusterFS doesn't perform very well with small files, is this
>> still the case?
>>
>> I would like to achieve similar (or better) performance to our current NFS
>> setup, with the added redundancy that GlusterFS provides.
>>
>> Is there any utilities I can use to test the performance?
>>
>> Thanks in Advance
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users@...
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Jerker Nyberg | 5 Apr 11:23 2012
Picon
Picon

Re: GlusterFS performance with small files.


I have a basic knowledge (I am a system administrator not a file system 
developer) but anyway this is how I understand the current situation:

You don't have to use distributed parallel cluster file systems (Lustre, 
GlusterFS, Ceph, Panasas, FhGFS etc), there are shared disk file systems 
also to look into. (OCFS2, GFS2 (Red Hat Global File System), StorNext 
(known as Xsan on Mac) etc) I have not really understood where GPFS fits 
in, it is as far as I understand block based but can scale to many 
servers, but I guess you do not need hundreds of backend servers.. Some of 
these require quite some knowledge and time to set up correctly.

I have personally only run GlusterFS and Ceph although Panasas is also 
used at our university for HPC. We ran for several years Xsan for some Mac 
servers (podcast producer and Apache/MySQL) with a FibreChannel attached 
Xraid but I would not recommend that solution today.

At a hosting company soon ten years ago we splitted up the users between 
different backend storage NFS/MySQL-servers and then put up a couple of 
front end servers (load balanced with LVS) in front of each backend, 
running Postfix/Courier-IMAP/Apache/etc. It is a proven solution although 
not as scalable beyond a single machine as the modern cloud inspired file 
systems are... But none of them is quite as stable yet.

Many SSD-drives can fit in a normal PC-server nowadays. For mail using 
Maildir usually the IOps are more important than bandwidth anyway. Keep 
your eyes open for FreeBSD/ZFS or Illumnos/ZFS too. ZFS still seem to be 
several years ahead of anything else native to Linux. ZFSonLinux.org is 
stable for my backup server when I do not use deduplication.

--jerker

On Wed, 4 Apr 2012, David Whiteman wrote:

> Hi,
>
> Thanks for the reply. Changing to mbox is not really an option, we are stuck 
> with MailDir format.
>
> All current cluster filesystems I've read into seem to have problems with 
> small files.
>
> I guess the only alternative seems to be a DRBD setup, but this would limit 
> me to 2 nodes only and was the reason I was looking into GlusterFS.
>
> Anyone know of any alternatives to GlusterFS that offer similar performance 
> (with very small files) to NFS?
>
> Thanks
>
> On 03/04/12 17:40, Bryan Whitehead wrote:
>> A bunch of small files is terrible performance. Really not much you
>> can do about that. Store each mailbox in a single file. MailDir format
>> is definitely going to suck.
>> 
>> On Tue, Apr 3, 2012 at 3:05 AM, David Whiteman<davew@...> 
>> wrote:
>>> Hi,
>>> 
>>> I am currently looking into GlusterFS to use as a storage cluster for our
>>> email storage. I want to mount the storage from different servers (or 
>>> VMs),
>>> services accessing the storage include exim, courier-imapd, courier-pop3d.
>>> Our emails are stored in MailDir format, which is many small files. I have
>>> read that GlusterFS doesn't perform very well with small files, is this
>>> still the case?
>>> 
>>> I would like to achieve similar (or better) performance to our current NFS
>>> setup, with the added redundancy that GlusterFS provides.
>>> 
>>> Is there any utilities I can use to test the performance?
>>> 
>>> Thanks in Advance
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users@...
>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@...
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
Brian Candler | 5 Apr 20:31 2012
Picon

Re: GlusterFS performance with small files.

On Tue, Apr 03, 2012 at 11:05:20AM +0100, David Whiteman wrote:
> I am currently looking into GlusterFS to use as a storage cluster
> for our email storage. I want to mount the storage from different
> servers (or VMs), services accessing the storage include exim,
> courier-imapd, courier-pop3d. Our emails are stored in MailDir
> format, which is many small files. I have read that GlusterFS
> doesn't perform very well with small files, is this still the case?

Depends. A *replicated* volume may be very slow, but maybe you don't need
this - e.g.  you can use a non-replicated (or distributed) volume, with
geo-replication to make an off-site disaster recovery backup. It's a
tradeoff between high availability and performance.

> Is there any utilities I can use to test the performance?

I think on coker.com.au there are some benchmark utilities for mail delivery
and retrieval.

Gmane