A Blog

GlusterFS

by Alex on Oct.13, 2009, under Purdue

At work we’ve been evaluating different distributed file systems in our spare time. Currently, we use one large, centralized filer and have seen problems being able to push as many input/output operations through as we’d like. While that’s mostly a backend disk problem, wouldn’t it be great to have a storage system that grew as we added more cluster nodes?
In that hope, we tested some pretty alpha-level pNFS code, some Hadoop, and now GlusterFS. All these seems to have some faults, but this is what I found for Gluster…
Downloading the RPMs from the main FTP repository and installing them was pretty painless on RHEL5. The documentation is pretty spare and misleading, but eventually whipping up these config files made it all go:

#glusterfsd.vol
volume posix
  type storage/posix
  option directory /glusterfs
end-volume

volume locks
  type features/locks
  subvolumes posix
end-volume

volume brick
  type performance/io-threads
  option thread-count 8
  subvolumes locks
end-volume

volume server
  type protocol/server
  option transport-type tcp
  option auth.addr.brick.allow *
  subvolumes brick
end-volume

and

volume remote1
  type protocol/client
  option transport-type tcp
  option remote-host foobar-0.example.com
  option remote-subvolume brick
end-volume

volume remote2
  type protocol/client
  option transport-type tcp
  option remote-host foobar-2.example.com
  option remote-subvolume brick
end-volume

volume remote3
  type protocol/client
  option transport-type tcp
  option remote-host pfnstest-003.example.com
  option remote-subvolume brick
end-volume

volume remote4
  type protocol/client
  option transport-type tcp
  option remote-host foobar-4.example.com
  option remote-subvolume brick
end-volume

volume remote5
  type protocol/client
  option transport-type tcp
  option remote-host foobar-5.example.com
  option remote-subvolume brick
end-volume

volume remote6
  type protocol/client
  option transport-type tcp
  option remote-host foobar-6.example.com
  option remote-subvolume brick
end-volume

volume replicate1
  type cluster/replicate
  subvolumes remote1 remote2 remote3
end-volume

volume replicate2
  type cluster/replicate
  subvolumes remote4 remote5 remote6
end-volume

volume distribute
  type cluster/distribute
  subvolumes replicate1 replicate2
end-volume

volume writebehind
  type performance/write-behind
  option window-size 4MB
  subvolumes distribute
end-volume

volume cache
  type performance/io-cache
  option cache-size 1024MB
  subvolumes writebehind
end-volume

The backing storage for this GlusterFS was an Ext3 file system carved out of LVM and housed on HP SATA disk trays. Mounting up that file system and running some simplistic tests, I found that using large file sizes the file system performance was about at the maximum network speed and that using small file sizes the performance was in the 5-10MB/s range. Not bad for an hour or two’s worth of effect.

:
3 comments for this entry:
  1. bulde

    Hi Alex,
    What are the ‘small’ block size you were using?

    Regards,
    Amar

  2. Alex

    I was testing using 1KB sized files. I essentially just timed how long it took to write out several thousand of them in a single directory.

  3. Anand Babu Periasamy

    Hi Alex, Thanks for the article. You should also take a look at recently added features – quick-read and stat-prefetch for small file performance improvement. I also recommend using 2-way replication on top of RAID’ed volumes than 3 way google style replication. It is economical and faster.

    Your feedback on documentation is correct. We are working on it.

    We are also working towards Gluster Platform release (Dec 2009) where the setup is entirely through browser.

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Visit our friends!

A few highly recommended friends...