GlusterFS
by Alex on Oct.13, 2009, under Purdue
At work we’ve been evaluating different distributed file systems in our spare time. Currently, we use one large, centralized filer and have seen problems being able to push as many input/output operations through as we’d like. While that’s mostly a backend disk problem, wouldn’t it be great to have a storage system that grew as we added more cluster nodes?
In that hope, we tested some pretty alpha-level pNFS code, some Hadoop, and now GlusterFS. All these seems to have some faults, but this is what I found for Gluster…
Downloading the RPMs from the main FTP repository and installing them was pretty painless on RHEL5. The documentation is pretty spare and misleading, but eventually whipping up these config files made it all go:
#glusterfsd.vol volume posix type storage/posix option directory /glusterfs end-volume volume locks type features/locks subvolumes posix end-volume volume brick type performance/io-threads option thread-count 8 subvolumes locks end-volume volume server type protocol/server option transport-type tcp option auth.addr.brick.allow * subvolumes brick end-volume
and
volume remote1 type protocol/client option transport-type tcp option remote-host foobar-0.example.com option remote-subvolume brick end-volume volume remote2 type protocol/client option transport-type tcp option remote-host foobar-2.example.com option remote-subvolume brick end-volume volume remote3 type protocol/client option transport-type tcp option remote-host pfnstest-003.example.com option remote-subvolume brick end-volume volume remote4 type protocol/client option transport-type tcp option remote-host foobar-4.example.com option remote-subvolume brick end-volume volume remote5 type protocol/client option transport-type tcp option remote-host foobar-5.example.com option remote-subvolume brick end-volume volume remote6 type protocol/client option transport-type tcp option remote-host foobar-6.example.com option remote-subvolume brick end-volume volume replicate1 type cluster/replicate subvolumes remote1 remote2 remote3 end-volume volume replicate2 type cluster/replicate subvolumes remote4 remote5 remote6 end-volume volume distribute type cluster/distribute subvolumes replicate1 replicate2 end-volume volume writebehind type performance/write-behind option window-size 4MB subvolumes distribute end-volume volume cache type performance/io-cache option cache-size 1024MB subvolumes writebehind end-volume
The backing storage for this GlusterFS was an Ext3 file system carved out of LVM and housed on HP SATA disk trays. Mounting up that file system and running some simplistic tests, I found that using large file sizes the file system performance was about at the maximum network speed and that using small file sizes the performance was in the 5-10MB/s range. Not bad for an hour or two’s worth of effect.
October 13th, 2009 on 6:20 pm
Hi Alex,
What are the ‘small’ block size you were using?
Regards,
Amar
October 25th, 2009 on 8:49 pm
I was testing using 1KB sized files. I essentially just timed how long it took to write out several thousand of them in a single directory.
November 20th, 2009 on 5:59 pm
Hi Alex, Thanks for the article. You should also take a look at recently added features – quick-read and stat-prefetch for small file performance improvement. I also recommend using 2-way replication on top of RAID’ed volumes than 3 way google style replication. It is economical and faster.
Your feedback on documentation is correct. We are working on it.
We are also working towards Gluster Platform release (Dec 2009) where the setup is entirely through browser.