Community Digital Archiving

Self-Sufficient Culture, Heritage and Free Software

Main ACDA Server – Wrong hardware RAID

  • January 17, 2013 12:38 pm

At the start of the Assynt Community Digital Archive project, there were a lot of unknowns.  Among these were the likely initial disk capacity requirements and how to implement them.  It quickly became clear that the Archive would require more than just a machine in the corner, as it had to run a fullish range of network services to be long lasting.  Among these were directory services, email, deployment (web) services, security services and remote access capabilities.  These would be deployed as virtualised services with considerable separation of services, rather than simply running a single server, which would have been possible.  The main reason for this was resilience and the opportunity to make changes to one sub-system without affecting the others, a principle which has worked well for me in the past.  It’s the type of design decision that only experience of what it takes to run services after the initial installation can bring.

The possibility of greater complexity than is ideal therefore raised its head, and that affected the choice of hardware.  Rather than go for a NAS storage option, the choice was made for an initial local storage RAID system on the main server.  This was purely an attempt to lessen the number of systems running on the network, and therefore an attempt to reduce complexity.  The cost differences were not that great.  The choice of hardware RAID, though, was a bad one.

The server itself had to be a named brand, and as I had had good success with IBM’s x86 server range, we plumped for an x3200 running SATA disks on IBM’s M1015 RAID card, an OEM version of an LSI RAID controller.  This was the first time I had used a SATA RAID controller, previous experience being SCSI based.

The first problem was the Debian Squeeze, in 2010, did not support the controller as part of a standard installation.  For reasons stated above, I did not want a special arrangement to shoe-horn Squeeze onto the machine, but Red Hat based distributions worked well enough.  While previously CentOS was the red Hat clone to go for, at that time, a lack of resources had left them without updates for quite a while, and their future was unclear.  Scientific Linux, though, the Red Hat clone developed and supported by CERN, Fermi Labs and others, looked as though it was a god option, and installation was a breeze.  All, that is, except for one compromise, which was that Red Hat and their clones do not support JFS as standard.  We’ll leave the file system choice for another post.

Having installed the base system as a KVM virtual machine host, performance testing of the disks threw up horrors.  Write speeds were pitiful, and every now and again the whole machine would simply stop responding for minutes at a time.  To cut a long googling story short, the issue is that the basic RAID card does not have a battery back-up for an onboard cache, and also no cache, and the effect on write speeds in particular is quite astonishing.  It’s hard to give an estimate of the speeds at that time, because it varied so wildly, but initial system caching was lightning fast, while subsequent speeds dropped to zero at times.

Then the disks started failing.  Again, let’s cut the long story short.  Three disk, all Seagates with the same manufacturing date, failed.  IBM only support SATA disks for a year, though the Seagate version of the same OEM disks are warrantied for 5 years.  These two events, the pitiful performance and the disk failure, leave a string impression and have dented my confidence in lower end IBM kit. By the time the server went dead, when the disks finally failed lemming-like, I had reason to bless the decision to implement as virtual machines, and the system could be transferred to desktop machines running a KVM host within a  few hours.  Again, the recovery process deserves its own post.

The kitty was also bare, as is to be expected with community projects which do not generate an income, so the only thing to be done was to be creative.  This entailed sacrificing four of the external removable backup disks, which contained 2TB Western Digital Caviar disks, and cannibalising them for the IBM.

By this time (January 2013) Debian Wheezy was maturing, and as Debian is used on all the VMs, it would be nicer if Debian was also running on the bare metal.  This time, Debian detected the RAID controller, allowing implementation of JFS.  Wheezy also recognised an EFI based system and installed itself accordingly.  The RAID was created as three virtual disks, one small one for the Debian plus KVM system, one 1TB partition for the virtual machine containers and one 2.6TB partition for the actual archive data.

Performance under Wheezy and JFS was much more consistent, but still absolutely dreadful.  I cannot believe IBM sell or sold the M1015 as a RAID5 controller without the additional hardware that apparently allows it to perform.  On testing, though, if the JFS partitions are mounted with the “sync” option, the performance stops fluctuating wildly, once the system resources are used up, BUT performance then peaks at around 4.5MB/s a good 10 times slower than it ought to be.  It is therefore a toss-up as to whether or not to allow cached performance for the usual smaller writes, while stunning the machine when larger writes are necessary, or to favour consistent operation over occasional flashes of acceptable speed.

The message is simple – don’t buy an M1015 if you want to write to the disks at anything other than glacial pace.

It has to be said that, in practice, this isn’t a killer issue, but more of an occasional hassle.  However, it does feel as though it is something hanging over one’s head, and is an unnecessary distraction.

The one alternative, which is attractive, is to use software RAID.  But that then defeats the point of having a nice managed RAID system, and when the disks couldn’t be trusted, sounded like a daft idea.  It is also necessary to boot from a separate disk, of course, further negating redundancy.  The portability  of the overall system may allow testing this at some point in the future though.