OSG Storage Forum – Fermilab
by Alex on Jul.05, 2009, under Purdue
Yeah, I’m a tad behind on updating my blog… So, just this week several of us from work went to Fermilab to participate in the OSG Storage Forum. Mostly, it was a couple day talk for interested parties participating in the grid to talk about their various storage solutions. The biggest presentations came from the folks involved with the CMS and ATLAS projects, stemming from the work at the LHC. To support physicists around the nation and the globe, there are a whole series of sites dedicated to providing access to the terabytes of data flowing from the LHC; all of it just waiting to be analyzed.
The biggest reason I went was to hear about other people using Hadoop as a replacement for a piece of software called dCache… Oh yeah, and it was held in the most awesome office building in the world:

That’s pretty much the interesting bits of that visit. Though, I thought Fermilab was pretty nifty.
TeraGrid 2009 – Arlington, VA
by Alex on Jul.05, 2009, under Purdue
During June, I traveled to Arlington, VA, for the Teragrid 2009 conference for work. This is certainly an interesting conference. For those that don’t know, the Teragrid is “an open scientific discovery infrastructure combining leadership class resources at eleven partner sites to create an integrated, persistent computational resource.” In other words, a fairly big project put together by the NSF to create a national cyberinfrastructure to serve the needs of science.
I was part of the student program, which included 120 other high school, undergraduate, and graduate students from all around the nation. It was certainly interesting listening to the talks given by middle and high school educators about how they are trying to integrate computer simulation and visualization into the classroom.
During the conference, I mostly had two goals: listen to talks and present some of the work being done at Purdue. I had a small poster in the poster session about deploying Hadoop and how Purdue envisions using the Hadoop Distributed File System to support high throughput computing (which, I hear we’re known for doing quite well). Also, we had a small talk in the Education and Outreach Track about how Purdue is using the SC Cluster Challenge to support undergraduate exploration of High Performance Computing. Thankfully, both my little poster and the talk drew a lot of positive attention.
Sadly, the Teragrid conference seems very focused around talking about the science being done on the grid and not so much about the technologies that have been deployed and developed to make “the grid.” (Not to say the science isn’t important, just most of it is way over my head.) The best parts of the conference were finding other site admins to talk with and the poster session.
And, here’s a picture of the capital building from when we ventured away from the hotel on the last day:

Intel Atom + ZFS = Mini NAS
by Alex on May.15, 2009, under Uncategorized
The server in the basement is currently a dual Opteron system in a one rack unit case. To say that it’s loud is an understatement. I guess it needs a lot of air flow to keep those processors cool. Since I hardly need the machine for its horsepower and just use it for storage, I decided it should get replaced with a quiet, low-power Intel Atom based computer. I found just the system from Foxconn, a small case with space for two disks. From Newegg, it cost $130 plus $20 for 2GB of memory.
I threw some disks into it, and chose to install FreeBSD 7.2. Currently, FreeBSD can not boot off of ZFS yet. To get around that, I netbooted the installer and dropped the system onto a thumb drive. I figure that flash memory won’t generally go bad, and that leaves the ZFS mirror for data.
Getting the FreeBSD installer to netboot was a little tricky. One can download the “bootonly” CD and extract it into a directory to be served off of NFS. With some DHCP magic to give the PXE bootloader on the NIC the server’s address and the file to boot from, it seemed to just go.. There was one sticking point: you’ve got to tell the boot loader to instruct the kernel where to find its root file system. It tuns out you can just add a line to the loader.conf file to fix it all up:
vfs.root.mountfrom="ufs:/dev/md0c"
Once the system got installed to the thumb drive, it was fairly easy to boot into single user mode and run the zpool and zfs commands to whip up a mirror volume. I copied /usr and /var into ZFS, resetting the mount points using ‘zfs’, and rebooted into a nicely functioning FreeBSD-ZFS nas.
So far, I’ve been trying to do a lot of weird I/O to see if I could get the system to crash. Thankfully, the following tuning parameters in the system’s loader.conf seem to be working really well:
zfs_load="YES"
vfs.root.mountfrom="zfs:puddle/root"
vm.kmem_size_max="1024M"
vm.kmem_size="1024M"
vfs.zfs.arc_max="512M"
vfs.zfs.prefetch_disable=1
It may not hold more than 2TB currently, but it seems to be performing wonderfully. Plus, it’s now a lot quieter in the basement!
One Year Gone
by Alex on Mar.29, 2009, under Tinkergeek
I also just noticed that I restarted my blog one year ago this evening. Man has it become much more useful than any of my previous attempts!
Cloud Computing: My Perspective
by Alex on Mar.29, 2009, under Tinkergeek
What is cloud computing? To me, it is when hardware or software is run someplace “else” for you. At Purdue, it appears we have interest in doing this for researchers with computing and storage needs. But, I have been doing “computing in the cloud” for a long time before getting involved with Purdue’s effort..
Tinkergeek.com has existed in one form or another for quite some time. Usually, it is hosted in a virtual machine running in someone else’s data center. Over the years, one or both of the machines behind Tinkergeek have been hosted with Linode, Slicehost, or a now defunct Unixshell. I’ve greatly enjoyed having accounts with both Slicehost and Linode. For $20/month or the like, they have provided very solid performance and uptime. To save costs and not host all my data entirely in cloud, the secondary machine for Tinkergeek is hosted at home on my DSL modem.
The next service that can really be defined as cloud computing is my Google Apps hosted domain. It provides my Google Calendar, gMail, and Google Docs access. Both calendar and mail have been very stable since I moved into Google over a year ago and after going with the “Premium” service, I have not had a single hiccup. Of course, even though I pay Google for service and “technically” have a phone number to call if everything goes to hell, I also make a local copy of all my mail and calendar data..
For entertainment, who wants to set their schedule according to some network’s broadcast schedule? A lot of people I know use TiVo devices to get time-shift their television watching. Running a dvr off of over-the-air broadcast is a pain in the rear, so I just use Hulu. This is probably the only “free” cloud service I use, although it does make me watch commercials. No big deal, and it has been quite nice to be able to watch just the shows I care about whenever I want to view them.. Although, Big Media seems scared by Hulu and keeps playing games with the availability of show episodes, making this definitely not the most reliable service on the planet.
A while ago, I attempted to use Amazon’s S3 service for storage in the cloud. I never found a very convenient interface for pushing and retrieving files from S3 and eventually stopped using the service. Then, DropBox came out. It is backed by S3 for storage and provides an excellent interface that integrates very well into OS X. Plus, even though I am using the free-account, DropBox syncs all my data to the machines joined to my account. So, even if Amazon goes down, my data is still available. I like that automatic insurance.
So, there is how I do my computing in the cloud. Nothing very funky or very far out there. Webmail, shared web hosting, and file services (like Xdrive) have been around since before I even got my first computer; now its all just called “cloud computing.” Maybe its the pretty, dynamic Web2.0 interfaces on everything that makes computing deserve to be high in the clouds?
10 Year’s of 461
by Alex on Mar.22, 2009, under Uncategorized
Back in high school, I was pretty active in a program called FIRST Robotics, For Inspiration and Recognition of Science and Technology. This year, my first FIRST team celebrated its ten year anniversary. 461 is the team at West Lafayette Junior/Senior High School. My first year on 461 was the last year there would be only a single team in the Purdue area serving all the area’s students. Flo and gang has helped start many teams all over the nation, but my second year (2005) would see the founding of 1646 and my third year (2006) would be when I founded 1747 at my high school. Being back in the shop at Westside was pretty amazing. Not much has changed since I was last there, asking to borrow tools for our makeshift shop in the science wing on the second floor of Harrison.
In other news, 1747 won the Buckeye Regional competition and the Buckeye Regional Engineering Inspiration Award; earning a place at the Championship Event this year. Quite amazing.
Quagga and Routing
by Alex on Mar.13, 2009, under Tinkergeek
So, after some thinking, some prodding, and some money, I started to play with networking. My home network has for the longest time always had a Linux box at the front of it doing firewalling, nat, and all the other goodies one needs on a network. Ebay provided the cheap avenue to get some “real” networking gear, and after some frantic tabs and ?’s, I got a shiny new Cisco 871 configured up to do NAT and take a DHCP address.
Eventually, I wanted to get IPv6 networking back into my house. After dealing with SiXXS for the longest time, I moved to Hurricane Electric’s free tunnel service. I’ve never been happier with a free service. However, my home Internet connection is a Verizon DSL line with a dynamic IP address. HE just uses a point to point link for providing connectivity, so I choose to home my tunnel and IPv6 space to a machine at the Purdue Computer Society. Then, I set up a static tunnel to my house and routed some space to my 871. The goal here is to always have a constant connection for the world to me, even if that constant connection leads to a box and then dead-ends. Plus, it seems sort of silly to waste a whole /48 at my house when there could be all sorts of more useful places to send my IPv6 subnet space to.
This is where Quagga comes in. I do not really want to have to maintain static routes pointed everything, I just want the routers to know about eachother and figure out the hard parts. While I’m digging the Cisco stuff, I’m certainly not interested in buying another router, but Linux does that routing thing nicely. Quagga provides the routing protocols to populate the kernel’s routing table. It seems like a match made in heaven. (As a side note, recent Quagga builds are broken with respect to advertising IPv6 routes in BGP… Check before pulling your hair out too!)
So, for testing, I installed Quagga on two Linux boxes and gave them a simple configuration (ASN’s changed to protect the innocent):
Fremont:
router bgp 65220
bgp router-id 128.46.156.55
neighbor 2001:470:c180:aa01::2 remote-as 65221
neighbor 2001:470:c180:aa01::2 next-hop-self
!
address-family ipv6
network 2001:470:1f11:6e5::/64
network 2001:470:c180::/48
aggregate-address 2001:470:c180::/48
neighbor 2001:470:c180:aa01::2 activate
exit-address-family
Saratoga:
router bgp 65221
bgp router-id 128.46.156.11
neighbor 2001:470:c180:aa01::1 remote-as 65220
neighbor 2001:470:c180:aa01::1 next-hop-self
!
address-family ipv6
network 2001:470:1f11:53f::/64
network 2001:470:c159::/48
neighbor 2001:470:c180:aa01::1 activate
exit-address-family
fremont> show ipv6 routeCodes: K – kernel route, C – connected, S – static, R – RIPng, O – OSPFv3,I – ISIS, B – BGP, * – FIB route.K>* ::/0 via 2001:470:1f10:6e5::1, he-1C>* ::1/128 is directly connected, loC>* 2001:470:1f10:6e5::/64 is directly connected, he-1B>* 2001:470:1f11:53f::/64 [20/0] via fe80::20d:93ff:fe60:9b64, eth0.11, 00:20:56C>* 2001:470:1f11:6e5::/64 is directly connected, he-1B>* 2001:470:c159::/48 [20/0] via fe80::20d:93ff:fe60:9b64, eth0.11, 00:20:56S 2001:470:c180::/48 [1/0] is directly connected, null0 inactiveC>* 2001:470:c180:aa01::/64 is directly connected, eth0.11C * fe80::/64 is directly connected, eth0.11C * fe80::/64 is directly connected, he-1C>* fe80::/64 is directly connected, eth0
Virtualization Options
by Alex on Feb.01, 2009, under Purdue
From Amazon to Linode and to scientific clouds, everyone seems to love using Xen to run a service for others. It appears the open source and easily scriptable nature, not to mention the lack of licensing problems and cost, make this a compelling option when you need to host a lot of VMs. On the other hand, it appears VMware has made highways over Xen when it comes to mainline IT and the desire to decrease cost. Nothing beats a manager’s opinion after getting flogged by the VMware sales’ stick.
After running VMware Server 1.0.5 for work’s small physical server to VM effort and deploying Xen for a scientific cloud, I have started to look into what’s next..
VMware Server 1.x is getting old and support has begun to ware thin. So, I hitched up an Ubuntu box with VMware Server 2.0 and started poking around. The first major difference is the command and control interface; it is now a web page with a lot of javascript and a plugin to view machine consoles. This is quite a bit different from the native X11 based interface that “just worked” over SSH with X forwarding. I could not for the life of me get the plugin working on my Mac natively.
The next stop for this train was more free VMware products: VMware ESXi. I had deployed an evaluation copy of ESX before and was taken aback when the only way to manage ESXi was from a Windows-only client. After some digging, to truly manage ESXi installations, you need a Windows-based server, only then could you get even a web interface.
It just seems wrong to rely on a Windows box to manage a fleet of VM-ized servers.. VMware Server 1.x was still looking pretty good.
Then, I remembered the Xen folks were doing a lot of work toward supporting unmodified guests. The Xen HVM stuff looks very promising. It does require having a recent processor and maintaing a primary host operating system, but allows one to have fine grain control over the network bridges and be completely open source. Using Xen-tools it is nearly trivial to deploy a new Linux distribution in an image and giving just a couple of parameters in a config file, you can boot Windows. Getting an encrypted VNC session for a console is a whole lot easier to deal with than attempting to get some sort of silly browser plugin functioning.
Now, just to figure out if sacrificing VMware image compatibility is worth abandoning VMware’s rather ridiculous management concepts for a truly free choice.
Blackboard Mail
by Alex on Jan.27, 2009, under Purdue
Purdue University is all about being #1 and ahead of the curve in terms of technology. That’s probably why we bought our online course software from a company that sells software to every other university system out there. This software was so bad, they had to change it’s name to get past the bad reputation it had gained.
Anyways, a function of this software is to allow professors and teaching assistants to send announcements and mail to students. That’s a fairly great feature, except that it really is not email. (The university has a great system in place so courses have mailing lists that automatically includes every student registered). As well, blackboard by default does not in any way notify students that they have pending messages or announcements from teachers unless they happen to log in and see the magic green box indicating something new.
Of course, this default setting is fairly ridiculous as checking yet another web site is not as fun and exciting as it would sound. The solution is to log into the blackboard and on the home page there is a “My Settings” link. If one ventures there and clicks on the “My Tool Options” tab, there are many good settings to play with. The best one of all, and the one that makes the blackboard mail system tolerable, is under “Mail” and then “Mail Forwarding”. So far, it appears that the messages it sends are just notifications about there being mail available… But, at least some notification is better than missing a note and showing up to an early morning class when it was cancelled the night before.
The methods I’ve had teachers use to distribute things:
- Chalk/Dry Erase Boards
- Sticky notes on doors
- Web pages
- News Groups
- Department Mailing Lists
- University Mailing Lists
- Mails with everyone in the To: field
- FTP
- Blackboard
- Cell phone calling trees
I can not wait until some CS prof revives gopher to distribute class files into Fax messages that get handled over a VoIP and that eventually calls my cell phone.
SiXXS vs HE
by Alex on Jan.24, 2009, under Tinkergeek
A friend noted that the “customer service” on SiXXS was getting to be quite a problem. It appears with popularity comes more trouble tickets and increased user support effort, something SiXXS appears unwilling to provide in a kind manner. Then, we started checking Hurricane Electric’s tunnel offerings..
HE does not have a “credits” system nor do they have strict requirements on things like user registration or tunnel performance. One merely requests a tunnel from one of their POPs and you get it automatically (along with another /64 so you can provide service to your LAN right off the bat). Then, once the tunnel is up and running, it is a simple click of a link to get a /48 allocation. No human intervention or long justifications required.
A big problem with tunnels is that latency can start to get out of hand. My native IPv4 connection gets under 80ms to most places on the Internet and usually it’s under 40ms to popular locations. To get anywhere interesting (Google and the like) over IPv6, it’s at least a 100ms journey.
Now after getting set up with a HE tunnel, I find my latency problems are a little less severe. In fact, I started running “traceroute” and noticed that most of the SiXXS stuff usually bounced around their POP for a bit and then jumped onto the HE network. After looking more into HE, it appears they have one of the largest IPv6 networks out there and they actually have customers using their network! SiXXS was a great starting point, but I wonder why people do not just go with the kinder, friendlier, and faster tunnel provider?
As a side note though, HE only provides static tunnels. No strange UDP encapsulated tunnels here to get past NAT or firewall issues.