John Heim | 25 Jun 2012 22:49

distributed network file system

I would like to set up a distributed network file system in my department. 
There is a dizzying array of possibilities, gfarmfs, ceph, glusterfs, just 
to name a few.

Needs:
1. Should work on a large number of small nodes, 100Gb each.
2. Parallelism & striping.
3. Prefer debian package, GPL.
4. Meta data in mysql would be nice.

Any experience and/or recommendations?

_______________________________________________
Ale mailing list
Ale@...
http://mail.ale.org/mailman/listinfo/ale
See JOBS, ANNOUNCE and SCHOOLS lists at
http://mail.ale.org/mailman/listinfo

Derek Atkins | 26 Jun 2012 15:04
Picon
Favicon

Re: distributed network file system

Hi,

"John Heim" <john@...> writes:

> I would like to set up a distributed network file system in my department. 
> There is a dizzying array of possibilities, gfarmfs, ceph, glusterfs, just 
> to name a few.
>
> Needs:
> 1. Should work on a large number of small nodes, 100Gb each.
> 2. Parallelism & striping.
> 3. Prefer debian package, GPL.
> 4. Meta data in mysql would be nice.
>
> Any experience and/or recommendations?

What are your requirements for usage of the space?  Are you trying to
get a distributed SAN array?  Or are you just trying to get a
distributed file space?

If the latter you might also want to look at OpenAFS.  It is F/OSS,
although it's not GPL.  Oh, and the metadata isn't stored in MySQL.

-derek

--

-- 
       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
       warlord@...                        PGP key available
(Continue reading)

John Heim | 26 Jun 2012 19:24

Re: distributed network file system

From: "Derek Atkins" <warlord@...>
To: "Atlanta Linux Enthusiasts" <ale@...>
Sent: Tuesday, June 26, 2012 8:04 AM
Subject: Re: [ale] distributed network file system

> Hi,
>
> "John Heim" <john@...> writes:
>
>> I would like to set up a distributed network file system in my 
>> department.
>> There is a dizzying array of possibilities, gfarmfs, ceph, glusterfs, 
>> just
>> to name a few.
>>
>> Needs:
>> 1. Should work on a large number of small nodes, 100Gb each.
>> 2. Parallelism & striping.
>> 3. Prefer debian package, GPL.
>> 4. Meta data in mysql would be nice.
>>
>> Any experience and/or recommendations?
>
> What are your requirements for usage of the space?  Are you trying to
> get a distributed SAN array?  Or are you just trying to get a
> distributed file space?
>
> If the latter you might also want to look at OpenAFS.  It is F/OSS,
> although it's not GPL.  Oh, and the metadata isn't stored in MySQL.

(Continue reading)

Robert L. Harris | 26 Jun 2012 19:35
Picon

Re: distributed network file system


Check out DRDB.  I'm using it to sync two 750G filesystems and it works spectacularly.  You should be able to do multiple nodes syncing 1 filesystem and it just generates additional mirrors.

Robert


On Tue, Jun 26, 2012 at 11:24 AM, John Heim <john-Z+tRoT2Tcg3k1uMJSBkQmQ@public.gmane.org> wrote:
From: "Derek Atkins" <warlord-DPNOqEs/LNQ@public.gmane.org>
To: "Atlanta Linux Enthusiasts" <ale-S6NtOCTnm14@public.gmane.org>
Sent: Tuesday, June 26, 2012 8:04 AM
Subject: Re: [ale] distributed network file system


> Hi,
>
> "John Heim" <john <at> johnheim.net> writes:
>
>> I would like to set up a distributed network file system in my
>> department.
>> There is a dizzying array of possibilities, gfarmfs, ceph, glusterfs,
>> just
>> to name a few.
>>
>> Needs:
>> 1. Should work on a large number of small nodes, 100Gb each.
>> 2. Parallelism & striping.
>> 3. Prefer debian package, GPL.
>> 4. Meta data in mysql would be nice.
>>
>> Any experience and/or recommendations?
>
> What are your requirements for usage of the space?  Are you trying to
> get a distributed SAN array?  Or are you just trying to get a
> distributed file space?
>
> If the latter you might also want to look at OpenAFS.  It is F/OSS,
> although it's not GPL.  Oh, and the metadata isn't stored in MySQL.

We have a 2Tb SAN for users to use for files space. But we have about 300
users so each gets only 6Gb.  That's just not enough for some users. Mostly,
its enough on a long term basis but sometimes they need to generate 50Gb -
100Gb of data. We have all kinds of disk space on each users workstation but
they can't get to it. This is deliberate. We don't want them saving files
where they won't be backed up.  And we want to be able to re-image a machine
at a moment's notice w/o having to have the user back up his stuff.

I got the brilliant idea of using the 100Gb (or so) of free space on each
workstation for a distributed network file system.  So we'd need to be able
to wipe out a node w/o losing anything.  I could make sure we copy the data
off before we re-image a workstation. But an end-user might simply turn
their workstation off. Whatever we use would have to deal with that.

The mysql thing was just a preference (over postgres). I have nothing
against other DBMSes. Its just that we already have mysql.

_______________________________________________
Ale mailing list
Ale-S6NtOCTnm14@public.gmane.org
http://mail.ale.org/mailman/listinfo/ale
See JOBS, ANNOUNCE and SCHOOLS lists at
http://mail.ale.org/mailman/listinfo



--
:wq!
---------------------------------------------------------------------------
Robert L. Harris

DISCLAIMER:
      These are MY OPINIONS             With Dreams To Be A King,
       ALONE.  I speak for                      First One Should Be A Man
       no-one else.                                     - Manowar
_______________________________________________
Ale mailing list
Ale@...
http://mail.ale.org/mailman/listinfo/ale
See JOBS, ANNOUNCE and SCHOOLS lists at
http://mail.ale.org/mailman/listinfo
Ed Cashin | 26 Jun 2012 21:24
Gravatar

Re: distributed network file system

Robert, if you really meant to spell it the way you did, I apologize and humbly request a link to the project so that I can learn more about it.  Otherwise...

Not really a nit but just a pointer for folks who want to google for RLH's nice suggestion: I always mix up the letters in DRBD, but it helps to remember that the last two stand for "block device".

I think it's worth mentioning that in addition to distributed filesystems, there have been a lot of new systems that are more like big storage containers, where you look up a value by providing a key.  They came out of the distributed hash table research in the early 2000s, so you can find them by following the DHT links on wikipedia.  If you don't need full filesystem semantics and you want the number of nodes and locations to grow a lot, they might be interesting.

On Tue, Jun 26, 2012 at 1:35 PM, Robert L. Harris <robert.l.harris-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

Check out DRDB.  I'm using it to sync two 750G filesystems and it works spectacularly.  You should be able to do multiple nodes syncing 1 filesystem and it just generates additional mirrors.

Robert


On Tue, Jun 26, 2012 at 11:24 AM, John Heim <john-Z+tRoT2Tcg3k1uMJSBkQmQ@public.gmane.org> wrote:
From: "Derek Atkins" <warlord-DPNOqEs/LNQ@public.gmane.org>
To: "Atlanta Linux Enthusiasts" <ale-S6NtOCTnm14@public.gmane.org>
Sent: Tuesday, June 26, 2012 8:04 AM
Subject: Re: [ale] distributed network file system


> Hi,
>
> "John Heim" <john-Z+tRoT2Tcg3k1uMJSBkQmQ@public.gmane.org> writes:
>
>> I would like to set up a distributed network file system in my
>> department.
>> There is a dizzying array of possibilities, gfarmfs, ceph, glusterfs,
>> just
>> to name a few.
>>
>> Needs:
>> 1. Should work on a large number of small nodes, 100Gb each.
>> 2. Parallelism & striping.
>> 3. Prefer debian package, GPL.
>> 4. Meta data in mysql would be nice.
>>
>> Any experience and/or recommendations?
>
> What are your requirements for usage of the space?  Are you trying to
> get a distributed SAN array?  Or are you just trying to get a
> distributed file space?
>
> If the latter you might also want to look at OpenAFS.  It is F/OSS,
> although it's not GPL.  Oh, and the metadata isn't stored in MySQL.

We have a 2Tb SAN for users to use for files space. But we have about 300
users so each gets only 6Gb.  That's just not enough for some users. Mostly,
its enough on a long term basis but sometimes they need to generate 50Gb -
100Gb of data. We have all kinds of disk space on each users workstation but
they can't get to it. This is deliberate. We don't want them saving files
where they won't be backed up.  And we want to be able to re-image a machine
at a moment's notice w/o having to have the user back up his stuff.

I got the brilliant idea of using the 100Gb (or so) of free space on each
workstation for a distributed network file system.  So we'd need to be able
to wipe out a node w/o losing anything.  I could make sure we copy the data
off before we re-image a workstation. But an end-user might simply turn
their workstation off. Whatever we use would have to deal with that.

The mysql thing was just a preference (over postgres). I have nothing
against other DBMSes. Its just that we already have mysql.

_______________________________________________
Ale mailing list
Ale-S6NtOCTnm14@public.gmane.org
http://mail.ale.org/mailman/listinfo/ale
See JOBS, ANNOUNCE and SCHOOLS lists at
http://mail.ale.org/mailman/listinfo



--
:wq!
---------------------------------------------------------------------------
Robert L. Harris

DISCLAIMER:
      These are MY OPINIONS             With Dreams To Be A King,
       ALONE.  I speak for                      First One Should Be A Man
       no-one else.                                     - Manowar

_______________________________________________
Ale mailing list
Ale-S6NtOCTnm14@public.gmane.org
http://mail.ale.org/mailman/listinfo/ale
See JOBS, ANNOUNCE and SCHOOLS lists at
http://mail.ale.org/mailman/listinfo




--
  Ed Cashin <ecashin-natZqmVBwV3k1uMJSBkQmQ@public.gmane.org>
  http://noserose.net/e/
  http://www.coraid.com/
_______________________________________________
Ale mailing list
Ale@...
http://mail.ale.org/mailman/listinfo/ale
See JOBS, ANNOUNCE and SCHOOLS lists at
http://mail.ale.org/mailman/listinfo
Jim Kinney | 26 Jun 2012 19:38
Picon

Re: distributed network file system

OK. That makes sense.

I was looking at using gluster or GFS to reuse space on school hard drives as redundant storage for student work. That seemed feasible as long as they were running a Linux.

On Tue, Jun 26, 2012 at 1:24 PM, John Heim <john-Z+tRoT2Tcg3k1uMJSBkQmQ@public.gmane.org> wrote:
From: "Derek Atkins" <warlord-DPNOqEs/LNQ@public.gmane.org>
To: "Atlanta Linux Enthusiasts" <ale-S6NtOCTnm14@public.gmane.org>
Sent: Tuesday, June 26, 2012 8:04 AM
Subject: Re: [ale] distributed network file system


> Hi,
>
> "John Heim" <john <at> johnheim.net> writes:
>
>> I would like to set up a distributed network file system in my
>> department.
>> There is a dizzying array of possibilities, gfarmfs, ceph, glusterfs,
>> just
>> to name a few.
>>
>> Needs:
>> 1. Should work on a large number of small nodes, 100Gb each.
>> 2. Parallelism & striping.
>> 3. Prefer debian package, GPL.
>> 4. Meta data in mysql would be nice.
>>
>> Any experience and/or recommendations?
>
> What are your requirements for usage of the space?  Are you trying to
> get a distributed SAN array?  Or are you just trying to get a
> distributed file space?
>
> If the latter you might also want to look at OpenAFS.  It is F/OSS,
> although it's not GPL.  Oh, and the metadata isn't stored in MySQL.

We have a 2Tb SAN for users to use for files space. But we have about 300
users so each gets only 6Gb.  That's just not enough for some users. Mostly,
its enough on a long term basis but sometimes they need to generate 50Gb -
100Gb of data. We have all kinds of disk space on each users workstation but
they can't get to it. This is deliberate. We don't want them saving files
where they won't be backed up.  And we want to be able to re-image a machine
at a moment's notice w/o having to have the user back up his stuff.

I got the brilliant idea of using the 100Gb (or so) of free space on each
workstation for a distributed network file system.  So we'd need to be able
to wipe out a node w/o losing anything.  I could make sure we copy the data
off before we re-image a workstation. But an end-user might simply turn
their workstation off. Whatever we use would have to deal with that.

The mysql thing was just a preference (over postgres). I have nothing
against other DBMSes. Its just that we already have mysql.

_______________________________________________
Ale mailing list
Ale-S6NtOCTnm14@public.gmane.org
http://mail.ale.org/mailman/listinfo/ale
See JOBS, ANNOUNCE and SCHOOLS lists at
http://mail.ale.org/mailman/listinfo



--
--
James P. Kinney III

Every time you stop a school, you will have to build a jail. What you gain at one end you lose at the other. It's like feeding a dog on his own tail. It won't fatten the dog.
- Speech 11/23/1900 Mark Twain

http://electjimkinney.org
http://heretothereideas.blogspot.com/

_______________________________________________
Ale mailing list
Ale@...
http://mail.ale.org/mailman/listinfo/ale
See JOBS, ANNOUNCE and SCHOOLS lists at
http://mail.ale.org/mailman/listinfo
Scott Plante | 27 Jun 2012 17:36
Gravatar

Re: distributed network file system

"We have all kinds of disk space on each users workstation but they can't get to it."

So are these all Linux workstations? 

Scott Plante
_______________________________________________
Ale mailing list
Ale@...
http://mail.ale.org/mailman/listinfo/ale
See JOBS, ANNOUNCE and SCHOOLS lists at
http://mail.ale.org/mailman/listinfo
Vernard Martin | 26 Jun 2012 22:16

Re: distributed network file system

On 06/25/2012 04:49 PM, John Heim wrote:
> I would like to set up a distributed network file system in my department.
> There is a dizzying array of possibilities, gfarmfs, ceph, glusterfs, just
> to name a few.
>
> Needs:
> 1. Should work on a large number of small nodes, 100Gb each.
> 2. Parallelism&  striping.
> 3. Prefer debian package, GPL.
> 4. Meta data in mysql would be nice.
>
> Any experience and/or recommendations?
>
> __
PVFS or Lustre work quite wellf or these tasks. I've managed to use 
gigabit ethernet to tie together 16 nodes with only 75GB each on them to 
form a fast filesystem. Also, Fghs is new on the block and very nice as 
well.

V

_______________________________________________
Ale mailing list
Ale@...
http://mail.ale.org/mailman/listinfo/ale
See JOBS, ANNOUNCE and SCHOOLS lists at
http://mail.ale.org/mailman/listinfo

Jeff Hubbs | 26 Jun 2012 22:55
Picon
Favicon

Re: distributed network file system

A couple years ago or so, I was able to get ATA Over Ethernet going 
between one target and two initiators, all running Gentoo.  It worked 
perfectly but for the lack of a cluster-aware filesystem; things got 
crazy in there pretty quick when both initiators mounted the target and 
started making writes. :)

On 6/26/12 4:16 PM, Vernard Martin wrote:
> On 06/25/2012 04:49 PM, John Heim wrote:
>> I would like to set up a distributed network file system in my department.
>> There is a dizzying array of possibilities, gfarmfs, ceph, glusterfs, just
>> to name a few.
>>
>> Needs:
>> 1. Should work on a large number of small nodes, 100Gb each.
>> 2. Parallelism&  striping.
>> 3. Prefer debian package, GPL.
>> 4. Meta data in mysql would be nice.
>>
>> Any experience and/or recommendations?
>>
>> __
> PVFS or Lustre work quite wellf or these tasks. I've managed to use
> gigabit ethernet to tie together 16 nodes with only 75GB each on them to
> form a fast filesystem. Also, Fghs is new on the block and very nice as
> well.
>
> V
>
>
> _______________________________________________
> Ale mailing list
> Ale@...
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
>

_______________________________________________
Ale mailing list
Ale@...
http://mail.ale.org/mailman/listinfo/ale
See JOBS, ANNOUNCE and SCHOOLS lists at
http://mail.ale.org/mailman/listinfo

Brian Mathis | 27 Jun 2012 23:37

Re: distributed network file system

On Mon, Jun 25, 2012 at 4:49 PM, John Heim <john <at> johnheim.net> wrote:
> I would like to set up a distributed network file system in my department.
> There is a dizzying array of possibilities, gfarmfs, ceph, glusterfs, just
> to name a few.
>
> Needs:
> 1. Should work on a large number of small nodes, 100Gb each.
> 2. Parallelism & striping.
> 3. Prefer debian package, GPL.
> 4. Meta data in mysql would be nice.
>
> Any experience and/or recommendations?

I've looked around for things to do this for a while, and finally came
across Tahoe LAFS, which runs on Linux, Windows, Mac, BSD, eth...  I
have not used it myself, but the project seems to have the same goals
that you are looking for.  I'm not sure if it's good for active online
storage though -- I always thought about it as extra storage for
backups.

https://tahoe-lafs.org/trac/tahoe-lafs

❧ Brian Mathis

_______________________________________________
Ale mailing list
Ale <at> ale.org
http://mail.ale.org/mailman/listinfo/ale
See JOBS, ANNOUNCE and SCHOOLS lists at
http://mail.ale.org/mailman/listinfo
Jim Kinney | 28 Jun 2012 00:48
Picon

Re: distributed network file system

Excellent! Good project link. Thanks!

This has many, many uses!

On 06/27/2012 05:37 PM, Brian Mathis wrote:
> On Mon, Jun 25, 2012 at 4:49 PM, John Heim<john <at> johnheim.net>  wrote:
>> I would like to set up a distributed network file system in my department.
>> There is a dizzying array of possibilities, gfarmfs, ceph, glusterfs, just
>> to name a few.
>>
>> Needs:
>> 1. Should work on a large number of small nodes, 100Gb each.
>> 2. Parallelism&  striping.
>> 3. Prefer debian package, GPL.
>> 4. Meta data in mysql would be nice.
>>
>> Any experience and/or recommendations?
>
> I've looked around for things to do this for a while, and finally came
> across Tahoe LAFS, which runs on Linux, Windows, Mac, BSD, eth...  I
> have not used it myself, but the project seems to have the same goals
> that you are looking for.  I'm not sure if it's good for active online
> storage though -- I always thought about it as extra storage for
> backups.
>
> https://tahoe-lafs.org/trac/tahoe-lafs
>
>
> ❧ Brian Mathis
>
> _______________________________________________
> Ale mailing list
> Ale <at> ale.org
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo

_______________________________________________
Ale mailing list
Ale <at> ale.org
http://mail.ale.org/mailman/listinfo/ale
See JOBS, ANNOUNCE and SCHOOLS lists at
http://mail.ale.org/mailman/listinfo

Gmane