Notes from installing OpenSolaris snv_74

I now have Solaris up and running and reasonably stable-looking, after only 12 hours of work. A number of things turned out to be bigger issues than I’d anticipated, largely because it’s been years since I last used Solaris and, frankly, Solaris’s disk partitioning and formatting tools suck.

              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    GB M/sec %CPU M/sec %CPU M/sec %CPU M/sec %CPU M/sec %CPU  /sec %CPU
zfs        10 105.0 55.3 163.3 27.3 121.0 30.4 119.2 88.4 287.1 36.2   169  1.8
zfs+c      10 112.9 59.7 181.5 30.3 127.8 29.1 118.1 86.0 424.9 52.2   198  2.1
Tags: opensolaris, server, zfs, gs, ramdisk

A new server (part 1)

A few days ago, I mentioned that my home NAS box had failed, and that I was considering replacing it with a PC server running OpenSolaris and ZFS. I’ve read a pile of ZFS docs, and it looks like the best option available to me today, so I decided to order some suitable hardware.

At that point, pretty much everything broke down. I have a hard enough time keeping track of which hardware works with Linux this week, and OpenSolaris is completely new to me. Sun’s list of officially-supported hardware is pretty sparse, and digging through their mailing list archives gets frustrating quickly. From what I can tell, it boils down to:

I was looking for a motherboard with 8 SATA ports, and was hoping that the Intel D975XBX2 (“Bad Axe 2”) would work, but 4 of its 8 SATA ports belong to a Marvell PCI-E SATA chip that doesn’t appear to be supported. I went through every single 8-port motherboard in Newegg’s (the ‘WS’ is important–the P5K is a different board). It only has 6 on-board SATA ports, but it includes a PCI-X slot. That’ll let me use the Supermicro AOC-SAT2-MV8, which is far and away the cheapest 8-port SATA card on the market. That’ll give me a total of 14 SATA ports, which should be enough for a whatever I want to throw at it. The Marvell PCI-X chip at the heart of the Supermicro card is the same one used in Sun’s Sun Fire x4500 48-drive server, so it’s safe to assume that Sun has put a lot of effort into the driver.

Most of the test of the system is fairly generic–a cheap nVidia 7200GS video card (the cheapest PCI-E card that NewEgg carries), a nice case and power supply, RAM, and a boatload of drives.

The one odd component that I’ve added is a Gigabyte GC-RAMDISK with 1 GB of RAM. The GC-RAMDISK is a battery-backed SATA ramdisk; it looks like a hard drive to the system and can survive up to 18 hours without power. I’ve had my eye on this thing for years, and it looks like it’ll be a perfect external log device for GFS. I had to ask to see how ZFS will behave if the device fails, and it looks like manual intervention may be required after an 18+ hour power outage, but it should be pretty minimal. I’m planning on posting some benchmarks here once I’ve had a chance to try it out.

Assuming that I’m able to get this whole mess to work at all, I should have lots to write about here over the next week or so. I’m going to start by explaining why I want to use Solaris instead of Linux or *BSD, and why I’m building something instead of buying a pre-build NAS box.

Tags: home, server, opensolaris, zfs, raid

Why not Linux (new server part 2)

So, as part of my new home server series, I want to explain why I’m using OpenSolaris instead of Linux.

I’ve used Linux since 0.97.1, in August of 1992. I’ve had at least one Linux box at home continuously since 1993 or so. I’ve had a few small chunks of my code added to the kernel over the years. I’ve built several install disks and one embedded appliance distro from scratch, starting with a kernel and busybox and going on up from there. I’ve written X drivers, camera drivers, and drivers for embedded devices on the motherboard. I’ve managed Great Heaping Big Gobs of Hardware at various jobs. Basically, I know Linux well, and I’ve used it for almost half of my life.

That in itself might mean that it’s time for a change–professionally, I’ve been very tightly focused on Linux, and diversity is a good thing. But that’s not why I’m using Solaris this week. I’m using it because I’m fed up with losing data to weird RAID issues with Linux, and I believe that OpenSolaris with ZFS will be substantially more reliable long-term. Things I’m specifically fed up with:

In short: everything works great when things are perfect, but building a reliable multi-drive storage system requires careful component and kernel compatibility work, and then you have to stay right on top of things if you want everything to keep working. When things stop working, they usually fail badly. That’s almost the complete antithesis of what I want for home: plug it in, and it just keeps working. I don’t want small failures to cascade through the system. Little failures should isolated, identified, and automatically repaired whenever possible. OpenSolaris and ZFS seems to provide that, while Linux with md and ext3 does not.

That’s why I’m planning on using ZFS. My logic for building a server vs. buying another little NAS box is simple: none of the little NAS boxes on the market use ZFS right now, and none of the cheap ones have room for more then 5 drives. I’m planning on using a double-parity system (RAID 6 or ZFS’s raidz2, where the system can cope with a 2-drive failure) plus a spare drive, and that’d only leave me with 2 data disks. The only way that I can get enough data with only 2 disks would be to use 1TB drives, and they’re too pricy right now.

So, I’m willing to spend the time to build a somewhat complex server because I believe (hope?) that it’ll save me time in the future, and it’ll let me avoid ever having to do the reconstruct-from-the-source dance again. I don’t think I lost anything critical last weekend, and I’m reasonably confident that I’ll be able to get things limping along well enough to recover data anyway, but I’ve now done this 3 times in the past 4 years, and I’ve had it.

Coming up soon: backups, OpenSolaris hardware compatibility, and GC-RAMDISK performance benchamarks. Stay tuned :-).

Tags: linux, solaris, opensolaris, zfs, raid, storage

ZFS and The Holy Grail of Storage

So, the comments on yesterday’s post about my nasty RAID failure encouraged me to spend some time looking at ZFS on OpenSolaris, and I really like what I see. I’ve ordered some new hardware, so I should have lots to write about by next weekend.

Reading the ZFS docs reminded me of my Holy Grail of Storage: a storage system that could actually do reasonably smart things with 3–5 drives. Imagine a system where you could start with 3 drives and simply plug new drives in as you need more space, without worrying about RAID or data layout. When you run out of slots, then just unplug the oldest, smallest drive and plug in a new, larger one, and the data will resync, giving you more disk space without needing any special work on your part. For bonus points, you’d be able to designate specific bits of your data as more or less important, so Bittorrent files might not be replicated at all, while your Word documents might be replicated onto every available drive.

I’ve wanted that for years, but I’ve largely dismissed it as a pipe dream, because it doesn’t fit cleanly into the drive/RAID/LVM/filesystem model that everything uses. The only thing that I’ve seen that even comes close is Drobo, and it’s supposedly fairly slow and really just too “magic” for me to trust.

I realized this morning that it’d be easy to build a storage system like this using ZFS. Just create a zpool with 3 drives to start, and then create zfs filesystems with copies=2 on top of it. When you add new drives, just add them to the pool. Blindly removing a single old drive will only leave you with a single copy of some of your files, but that shouldn’t be fatal, and ZFS can copy everything off of it if you give it a chance. There are some corner cases that will give you less redundancy–if you manage to fill the system 98% full before adding a new drive, then all of the replicas of new data will probably end up on the same disk. There are a couple obvious workarounds, and Sun will probably add replication rebalancing at some point, if it isn’t there already.

Tags: zfs, opensolaris, storage, raid