Community Digital Archiving

Self-Sufficient Culture, Heritage and Free Software

Varieties of community archives

  • February 12, 2013 4:48 pm

I was chatting with someone this morning regarding the different challenges facing community groups.  In particular, we were talking about long term storage and access to information which may be generated as part of a project.  Sometimes, information gets stored outside the community’s or the project’s control, and sometimes it doesn’t get kept at all.  Sometimes, it is hoped someone else does all the storage.

The choice of Free and Open Source software that we have made here in Assynt shows that it is possible to retain control over information generated relating to your own location, project or community.  While technology fashions change, such as the latest fashion to store data “in the cloud,” such options may not be conducive to long term needs of communities, especially when information is so important to who we are, and what makes our community important to us.

So whether you are involved in tourism, scientific projects, archaeology, geology, botany, social sciences, or the humanities, such as cultural projects, the need to store information remains.  As time passes, so the body of information collected becomes more valuable and more relevant to the community.  So when thinking about your project, remember that you can also manage the information legacy of the project, and do that in such a way that it has long term value to your community.

Distributed, archiveable genealogical research

  • January 25, 2013 10:41 am

Here’s an interesting idea – using the TiddlyWiki application, which is a simple HTML file containing javascript, and which allows you to create effectively a local web site as a standalone file, as a genealogical research notebook.

Many community archives are of interest for their genealogical research potential.  If family researchers used TiddlyWiki in the way suggested, it would be an easy task to add the TiddlyWiki to a DSpace community archive, where its detail can be of wider interest.  TiddlyWiki is a reasonable way to overcome barriers posed by different technology platforms while still making the results available to all.  There is even a wrapper around TiddlyWiki for Android tablets, Andtidwiki, which could be useful for some researchers.

TiddlyWiki is good for this function, as with many other functions, because it is by nature non-linear, but for the same reason it takes a bit of effort to wrap your head around how it works.  A good introduction to the principles can be found in this PDF and there are many “getting started” and similar tutorials on the net.

Well, it can be done – DSpace on a Raspberrypi

  • January 21, 2013 5:21 pm

The little Raspberrypi computer, designed to re-create that wonderful sense of being able to do things yourself that those of us who cut our teeth in the early days of personal computing thrived on, is bringing the fun of general purpose computing back to all ages, especially kids.  It’s wonderful seeing the blog posts and Twitter comments from parents especially, proud of the achievements of even very young children, on this capable little machine.  The combination of that sense of the possible along with an astonishing price tag of around £25 has resulted in great success, well worthy of the foresight of the small and principled team that made it possible.

But can it run a DSpace archive?

And the answer is Yes!  Well, maybe yeeees-ish…

While in general, Free Software tends to be pretty efficient in terms of its resource use, java-based software like DSpace isn’t known for this aspect of its operation.  But here we are, the Pi in operation, dwarfed by the only USB network stick for network access I had handy:-

RaspberryPi running DSpace

The drawback isn’t memory, of which the Pi only has 256MB (although the latest versions have – phoooaaarr – 512MB) but rather the hit on the  processor.  Start-up takes over 10 minutes, and the first time each page is displayed the processor works hard.  Below is a performance monitor screen shot and you can see the processes and the flat-out working of the processor:-

I know I can, I think I can...

And while it doesn’t show much, here’s the DSpace instance freshly minted:-

Squeaky new DSpace screen

The first time you click on a new screen, it takes a while to get there, but after the first time, the speed is quite reasonable.

So yes, it can be done, and it’s been an interesting exercise.  I used the standard Raspian Linux distribution for this, and it was very straight forward to do the installation.

Why would you do this?  Well in the Raspberrypi tradition, because you can.  But there may be other reasons too.  It may be useful in some instances to have a heavily distributed  series of DSpace instances each of which gets a light impact but which together form a significant archive.  It may be that a particular project needs an Archive but has an impossibly tight budget, but that needs to stick to standard software, and with the Pi, you’ll get change from £50 after buying a wireless USB stick and SD-card storage.  Or, and this is an interesting use case, you may need an Archive somewhere where the power supply is very limited.  The Pi needs less than 5 watts to run.

Archive Training – Joint Venture with Assynt Learning

  • January 18, 2013 11:54 am

As part of the winter learning programme developed by Assynt Learning, training in the use of the Archive will be delivered in February and March 2013.  It is a sign of the community nature of the Archive that Assynt Learning’s welcome involvement is taking place.  Anyone interested in taking part should contact Sharon or  Sandra at the Leisure Centre on 844123

Complete disk failure – DR plan to the rescue

  • January 17, 2013 4:44 pm

The Assynt Community Digital Archive is built mostly on five virtual servers, separating services across virtual machines to aid resilience and ease system updates by containing risk to just the service in question.  This tactic has other advantages, as we shall see.  Four of the virtual servers are required, and the fifth is a remote access virtual desktops system, not strictly required, but definitely useful, for a disaster recovery situation.  This article uses technical terms which may be most meaningful to a technical reader but the outcome should be of general interest.

Just before Christmas 2012, remote monitoring software showed that the entire archive was down, all five virtual machines, but the firewall, a separate little unit, was still accessible via ssh, and through the firewall access was possible to a tiny £100 test server Stevan uses (an HP microserver with just 1GB of memory and a 2TB disk.)  The test server is also used as a staging post for backups, the main systems backing up to the small server, and from there, copied to removable disks for offsite storage.  As it happened, Stevan was unable to go to the Archive for a few weeks, but this represented a good opportunity to look at disaster recovery.  One of the glories of Free and Open Source software is that it tends to be quite modest in terms of hardware requirements.  With virtual systems, you can also cut your cloth to suit, as will be shown.

The backups already on the little microserver already contained all the data, but, as it happened,  copies had been taken of the main virtual server containers just a few weeks before.  As a test server, it was already set up as a KVM virtual system host. It was therefore simple to copy over the various XML definition files for each virtual system.  This meant modifying each file slightly, as there were differences between the file format on the Debian Squeeze installation on the microserver, and the Scientific Linux installation on the main server.  (This difference is now a thing of the past, as the main server now also runs Debian.)  Other edits were to change the location of the container files, though it may  have been possible to put in a symlink to the location as well.  However, edits were additionally required to make sure the memory requirements for each server fitted into the miserably small 1GB of memory available.  This was done without much trouble, and each server was brought up with manual commands.  This is quite important, as automatic starts are not wanted on this server.  If it rebooted for some reason, and a second instance of the same virtual machine tried to come up on the network , address clashing would occur.

The commands are quite straight forward – “virsh create <XML Definition file>” to set up the virtual machine followed by “virsh start <MACHINE_NAME>” to start it.

So within an hour or two the basic system was back up and running.  We were a tiny bit fortunate in that no work at all had been done on the Archive itself since the last backup, so it was possible to restore the systems to the last backup point quite quickly.  What is more, the entire process was done remotely, in this case from just 7 miles away, but it would not have mattered had the systems been on the the side of the planet.

Later, Stevan was able to go in to the Archive, the additional flexibility of virtual system then came to the fore.  A small desktop machine was brought into service running Debian Squeeze, and some of the virtual machines were transferred to this machine, reducing the load on the brave little microserver.

This was obviously a full DR exercise rather than a mere test, and worked well.  Some learnings to ease the process have been noted for future reference, but in these instances, it is often less of a case of blindly following a disaster recovery plan and more of a case of having the required flexibility to work with what one has.  A virtualised Free Software environment certainly provides that.  While a community archive suffers merely inconvenience rather than business or financial peril in the event of a complete disaster it is still a source of professional pride to be able to deliver such a robust and resilient solution.

The cause of the main server failure, as noted in the title, was the failure of three of the four disks in the IBM RAID array.  All three (bit not the fourth) had the same manufacturing date.  All three were Seagates, which, had they been bought from Seagate, would have had a five year warranty, but IBM only offers a year on the exact same disks.  Details about this event are included here