A Blog

Dirvish

by Alex on Feb.08, 2010, under Tinkergeek

Tinkergeek recently moved to its own dedicated server hosted by fdc servers. The transition wasn’t quite as smooth as one would hope, but they do provide cheap hosting. With moving to a dedicated server based around cheap PC hardware, I thought it’d be a great idea to bring back an old backup solution. Dirvish is a neat set of scripts that combine the goodness of hard-links and rsync. The goal is that Dirvish creates a full backup once and then stores just the changes of the target file system on the backup system.

Installing dirvish on a Debian system is fairly easy:

apt-get install dirvish

Then, one merely has to copy the example configuration files into place. The first one is the master configuration file that goes in /etc/dirvish and can be found in /usr/share/doc/dirvish/master.conf.

## Example dirvish master configuration file:
bank:
	/backup
exclude:
	lost+found/
	core
	*~
	.nfs*
Runall:
	root	22:00
expire-default: +15 days
expire-rule:
#       MIN HR    DOM MON       DOW  STRFTIME_FMT
	*   *     *   *         1    +3 months
#	*   *     1-7 *         1    +1 year
#	*   *     1-7 1,4,7,10  1
	*   10-20 *   *         *    +4 days
#	*   *     *   *         2-7  +15 days

Under bank: is going to be the place on your machine that contains all the backups. I’d suggest making this its own file system, as dirvish can eat inodes like there’s no tomorrow. Next, I’d suggest adding /proc and /sys under the global exclude: section just to ensure you don’t back these directories up.

Now, you have to make your first machine directory for backups (known as a vault in dirvish speak). This directory structure will be under the directory in the bank section from above.

mkdir -p /backup/example.com/dirvish

Now, just copy the default.conf example from /usr/share/doc/dirvish/examples into example.com/dirvish and edit.

client: thishost
tree: /
xdev: 1
index: gzip
log: gzip
image-default: %Y%m%d
exclude:
	/var/cache/apt/archives/*.deb
	/var/cache/man/**
	/tmp/**
	/var/tmp/**
        *.bak

The key points in this file file are the client, xdev, and exclude directives. Merely change your client: to be the machine IP or hostname that you’re backing up. Xdev tells rsync to traverse file systems on the machine; generally, you’ll want to be careful with this setting, specially if you mount NFS shares. Lastly, update the exclude list for this particular machine. If you’re backing up the local machine using dirvish, be sure to exclude the dirvish bank directory!

Now, you’re all set to create the first backup with:

dirvish --vault example.com --init

If all goes well, you’ll have a new directory under /backup/example.com with the current data and a copy of the target. If there was an error, be sure to remove the failed backup attempt from /backup/example.com and rerun the dirvish command after fixing the error.

Now, the only thing left is to run dirvish-runall via cron at some convenient time and you’re on your way to having a decent backup solution. Besure to read the remainder of the dirvish documentation to pick up the finer points of configuration.

Leave a Comment more...

Quick Howto on Lustre 1.8

by Alex on Nov.24, 2009, under Purdue

Everybody likes fast file performance and recently I’ve been twiddling with different distributed/parallel/clustered file systems for fun and excitement. Tonight was Lustre’s turn to be toyed with. Below is how I got a small/slow Lustre going on my laptop using VirtualBox.

First, install CentOS 5 on some VMs, perhaps three? Then download the RPM packages from Sun/Lustre.org. On the servers:

rpm -ivh kernel-lustre*
rpm -ivh lustre-1.8.1.1* lustre-ldiskfs* lustre-modules* e2fsprogs*

On the clients, install the appropriate kernel package for the patch-less client and then the lustre-client packages:

rpm -ivh --force kernel-2.6.18-128.7.1.el5.i686.rpm
rpm -ivh lustre-client-1.8.1.1-2.6.18_128.7.1.el5_lustre.1.8.1.1.i686.rpm lustre-client-modules-1.8.1.1-2.6.18_128.7.1.el5_lustre.1.8.1.1.i686.rpm

Now, specially if your VMs have one network for the outside world through NAT and one network for inter-VM communication, you should add the following line to /etc/modprobe.conf to make Lustre find and use the correct interface:

options lnet networks=tcp0(eth1)

Now it’s time to set up the meta-data and management servers:

mkfs.lustre --reformat --device-size=250000 --fsname lustre --mdt --mgs /tmp/mdt
mkdir -p /lustre/mds
mount -t lustre -o loop /tmp/mdt /lustre/mds

Now it’s time to set up the servers and storage targets (in this simple example, there is only one server per target). Run this on all the storage servers:

mkfs.lustre --reformat --device-size=10000000 --fsname lustre --ost --mgsnode=@tcp0 /tmp/ost
mkdir -p /lustre/ost
mount -t lustre -o loop /tmp/ost /lustre/ost

After the client has finished rebooting into it proper kernel, mounting the file system is straight forward:

mount -t lustre @tcp0:/lustre /mnt

Of course, now it’s time to write files into our file system to see that it really works:

dd if=/dev/zero of=/mnt/foo bs=1M count=100
lfs getstripe /mnt/*

At this point, if you have multiple storage targets, then you’ll see that your large file got written to just a single target. That’s sad, since we were hoping for super-ultra-fast file I/O from our VMs running Lustre. Thankfully, this can be easily fix:

mkdir /mnt/super-fast
lfs setstripe -c 2 /mnt/super-fast
dd if=/dev/zero of=/mnt/super-fast/bar bs=1M count=100
lfs getstripe /mnt/super-fast/*

Aaah, there we go. You should have gotten “twice” the performance in MB/s reported from dd and you should see that now your large file of zero’s got written out to two different targets. Of course, the Lustre manual has all sorts of useful tuning suggestions and things to try to get working (like shared targets and proper failover). Many things to try, but I just thought I should document all this in a place that I could later find it easily. As for where I got this, I mostly found quick bits at this blog and the stranger stuff in the Lustre 1.8 manual when I got stuck on my network problems.

Comments Off more...

Super Computing 2009 – Final Day

by Alex on Nov.20, 2009, under Purdue, Tinkergeek

So, I didn’t have much of a chance to update the blog with blow-by-blow action updates at SC this year.. Oops. However, the show was a blast. The Purdue team won the “lowest power consumption” award. It was certainly a surprise, given another team had a generally more efficient power design. In any case, we’re certainly happy we won an award.

I did not get much of a chance to walk around the floor between working both of Purdue’s booths. However, I did hear about and see one or two interesting things. The first was the new ethernet gear coming out of Voltaire. They can confederate several of their switch chassises to form a virtual switch that allows an LACP bond between the chassises to be active-active, which seems to be all the rage these days. After visiting the Voltaire booth, Cisco also has various solutions.. from the virtual port channel on the nexus gear and the virtual switch chassis on the catalyst gear. Also, it appears that perhaps Cisco will be implementing something like Woven’s multi-path layer 2 magic, so that could be fun too.

Lastly, the most interesting company on the floor that I saw was the “Two Guys and a Cluster” folks. Really, it’s just two guys that live on opposite coasts that help people acquire and run clusters. They seem to be a nice middle ground between a value-added cluster vendor and just buying hardware off a website and doing it all yourself. Their web address, with the start of an interesting blog, can be found here.

Comments Off : more...

Super Computing 2009 – Day 2

by Alex on Nov.16, 2009, under Purdue

The day started early with the team wondering to the convention center to start validating all our applications. We found that the center’s power dropped voltage under load enough to cause us to overshoot our current problems badly. After taking account of all our hardware, we removed some memory and turned off our spare hardware which solved our problems.

At the end of the day, everything was good to go for the morning when we start running our benchmark for the judging. I’ll leave you with a picture of our booth this year:

sc09 booth photo

Comments Off : more...

Super Computing 2009 – Day 1

by Alex on Nov.16, 2009, under Purdue

After getting up at 3AM eastern time to make our flight at 6AM, the Purdue Student Cluster Challenge team made it to Portland, Oregon, on time and not terribly tired. The challenge organizers had our box sitting in our booth and we took no time in starting to unbox it.

For now, here is a link that talks about the challenge for this year: InsideHPC.

Comments Off : more...

Transparent Squid Proxy using WCCP

by Alex on Oct.13, 2009, under Tinkergeek

Using a web caching proxy can help save bandwidth and provide a good log of web traffic. At the house, I have a slow-ish DSL link to the world and would certainly love to have video not freeze when also browsing the web. And as a friend points out, sometimes it’s easier to have a log of traffic when digging into problems. Although, my one biggest pet peeve is having to manually reconfigure my laptop when moving between campus and home. OSX does have the concept of “network locations,” but usually half the time I forget to change it. So, the only solution to this problem is a transparent web-caching proxy server. Thankfully, I run a server in the basement for fun and excitement anyway.
First things first, one needs to set up the “Web Cache Control Protocol” on the router. At home, I use a Cisco 871, so I issued these changes:

config terminal

ip wccp web-cache
interface vlan1000
  ip wccp web-cache redirect in

exit

This sets up WCCP and applies it to the virtual network my wireless network uses. Next, we need to set up the Ubuntu box with Squid, thankfully that’s pretty easy. “apt-get install squid” is good enough to get the package. Then we need to ensure these lines are in squid.conf, the rest of the default options are probably good enough for a first go:

wccp2_router IP_OF_ROUTER
wccp_version 4
wccp2_forwarding_method 1
wccp2_return_method 1
wccp2_service standard 0
http_port 3128 transparent
acl our_networks src WORKSTATION_SUBNET/24
http_access allow our_networks

So, at this point, the router is using WCCP and our server has Squid going. Now, we just need some magic to tie the two together. This is done using a magical combination of iptables and a GRE tunnel. First things first, we need to load the “ip_gre” kernel module on boot: “echo ip_gre >> /etc/modules” should do the trick. For now, “modprobe ip_gre” is good.
Then we need to construct the tunnel by issuing the following commands:

/sbin/ip link set wccp1 mtu 1476
/sbin/ip tunnel add wccp1 mode gre remote ROUTER_IP local MACHINE_IP dev ETHERNET_DEVICE
/sbin/ip addr add MACHINE_IP dev wccp1
/sbin/ip link set wccp1 up
/sbin/sysctl -w net.ipv4.conf.wccp1.rp_filter=0
/sbin/sysctl -w net.ipv4.conf.ETHERNET_DEVICE.rp_filter=0
/sbin/iptables-restore < /etc/default/iptables

This constructs the tunnel and brings it up. At the house, the server has a separate connection on a separate virtual network from everything else. WCCP should handle the situation where your web cache is in the subnet whose traffic is getting redirected, but I wanted to ensure there weren't going to be any difficulties.
Now, the final piece of the puzzle are the iptables rules that search for traffic coming down the tunnel and redirects them to Squid:

# /etc/default/iptables
# Allow in everything, from everywhere
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
COMMIT

*nat
:PREROUTING ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
# Reroute HTTP requests to the proxy server
-A PREROUTING -i wccp1 -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 3128
-A PREROUTING -i wccp1 -p tcp -m tcp --dport 8000 -j REDIRECT --to-ports 3128
-A PREROUTING -i wccp1 -p tcp -m tcp --dport 8080 -j REDIRECT --to-ports 3128
COMMIT

Once all of this is in place and running, you should be able to see web traffic get cached and logged into /var/log/squid3/access.log. From my logs after several days, I'm seeing the cache serve about 4% of my traffic. Given the amount of dynamic content on the Internet today, that's not very surprising. What's it worth all this effort to get going? Probably not, but it was sure fun.
I need to point out that most of the content of this post can be found on the Squid site and wiki, and that community should receive any positive credit for their fine work.

Comments Off : more...

GlusterFS

by Alex on Oct.13, 2009, under Purdue

At work we’ve been evaluating different distributed file systems in our spare time. Currently, we use one large, centralized filer and have seen problems being able to push as many input/output operations through as we’d like. While that’s mostly a backend disk problem, wouldn’t it be great to have a storage system that grew as we added more cluster nodes?
In that hope, we tested some pretty alpha-level pNFS code, some Hadoop, and now GlusterFS. All these seems to have some faults, but this is what I found for Gluster…
Downloading the RPMs from the main FTP repository and installing them was pretty painless on RHEL5. The documentation is pretty spare and misleading, but eventually whipping up these config files made it all go:

#glusterfsd.vol
volume posix
  type storage/posix
  option directory /glusterfs
end-volume

volume locks
  type features/locks
  subvolumes posix
end-volume

volume brick
  type performance/io-threads
  option thread-count 8
  subvolumes locks
end-volume

volume server
  type protocol/server
  option transport-type tcp
  option auth.addr.brick.allow *
  subvolumes brick
end-volume

and

volume remote1
  type protocol/client
  option transport-type tcp
  option remote-host foobar-0.example.com
  option remote-subvolume brick
end-volume

volume remote2
  type protocol/client
  option transport-type tcp
  option remote-host foobar-2.example.com
  option remote-subvolume brick
end-volume

volume remote3
  type protocol/client
  option transport-type tcp
  option remote-host pfnstest-003.example.com
  option remote-subvolume brick
end-volume

volume remote4
  type protocol/client
  option transport-type tcp
  option remote-host foobar-4.example.com
  option remote-subvolume brick
end-volume

volume remote5
  type protocol/client
  option transport-type tcp
  option remote-host foobar-5.example.com
  option remote-subvolume brick
end-volume

volume remote6
  type protocol/client
  option transport-type tcp
  option remote-host foobar-6.example.com
  option remote-subvolume brick
end-volume

volume replicate1
  type cluster/replicate
  subvolumes remote1 remote2 remote3
end-volume

volume replicate2
  type cluster/replicate
  subvolumes remote4 remote5 remote6
end-volume

volume distribute
  type cluster/distribute
  subvolumes replicate1 replicate2
end-volume

volume writebehind
  type performance/write-behind
  option window-size 4MB
  subvolumes distribute
end-volume

volume cache
  type performance/io-cache
  option cache-size 1024MB
  subvolumes writebehind
end-volume

The backing storage for this GlusterFS was an Ext3 file system carved out of LVM and housed on HP SATA disk trays. Mounting up that file system and running some simplistic tests, I found that using large file sizes the file system performance was about at the maximum network speed and that using small file sizes the performance was in the 5-10MB/s range. Not bad for an hour or two’s worth of effect.

3 Comments : more...

Another Install Day

by Alex on Jul.19, 2009, under Purdue

It appears our last cluster was such a hit that once again my department at Purdue is going to do a single day, multi-hundred node cluster installation. Last year, the work went very quickly and we had the cluster racked before lunch. This year, we asked for fewer volunteers and added an extra step in the process, installing 10GigE network adapters. Hopefully that will slow things down enough so the VIPs can actually see us working.

As always, the marketing folks put together a short little clip to push the day:

The racking and stacking is pretty much just a matter of man power. Getting the software onto the systems is done using RHEL’s Anaconda and several scripts. All the magic was described by some of my coworkers in a paper for the USENIX’s LISA 2008 Conference.. A copy can be found here. It’s pretty fun stuff.

Comments Off :, more...

OSG Storage Forum – Fermilab

by Alex on Jul.05, 2009, under Purdue

Yeah, I’m a tad behind on updating my blog… So, just this week several of us from work went to Fermilab to participate in the OSG Storage Forum. Mostly, it was a couple day talk for interested parties participating in the grid to talk about their various storage solutions. The biggest presentations came from the folks involved with the CMS and ATLAS projects, stemming from the work at the LHC. To support physicists around the nation and the globe, there are a whole series of sites dedicated to providing access to the terabytes of data flowing from the LHC; all of it just waiting to be analyzed.

The biggest reason I went was to hear about other people using Hadoop as a replacement for a piece of software called dCache… Oh yeah, and it was held in the most awesome office building in the world:

Fermilab

That’s pretty much the interesting bits of that visit. Though, I thought Fermilab was pretty nifty.

Comments Off :, more...

TeraGrid 2009 – Arlington, VA

by Alex on Jul.05, 2009, under Purdue

During June, I traveled to Arlington, VA, for the Teragrid 2009 conference for work. This is certainly an interesting conference. For those that don’t know, the Teragrid is “an open scientific discovery infrastructure combining leadership class resources at eleven partner sites to create an integrated, persistent computational resource.” In other words, a fairly big project put together by the NSF to create a national cyberinfrastructure to serve the needs of science.

I was part of the student program, which included 120 other high school, undergraduate, and graduate students from all around the nation. It was certainly interesting listening to the talks given by middle and high school educators about how they are trying to integrate computer simulation and visualization into the classroom.

During the conference, I mostly had two goals: listen to talks and present some of the work being done at Purdue. I had a small poster in the poster session about deploying Hadoop and how Purdue envisions using the Hadoop Distributed File System to support high throughput computing (which, I hear we’re known for doing quite well). Also, we had a small talk in the Education and Outreach Track about how Purdue is using the SC Cluster Challenge to support undergraduate exploration of High Performance Computing. Thankfully, both my little poster and the talk drew a lot of positive attention.

Sadly, the Teragrid conference seems very focused around talking about the science being done on the grid and not so much about the technologies that have been deployed and developed to make “the grid.” (Not to say the science isn’t important, just most of it is way over my head.) The best parts of the conference were finding other site admins to talk with and the poster session.

And, here’s a picture of the capital building from when we ventured away from the hotel on the last day:
US Capital Building

Comments Off : more...

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Visit our friends!

A few highly recommended friends...