One of the delights of running a little Raspberry Pi as a primary server for a couple of users, all their devices and several domains, is the achievement associated with getting quarts out of pint pots.  While in our case, the main motivation is to use as little power as possible, as we are off-grid, I would probably use a Pi where I could even if we were on-grid, as it is so much more satisfying than simply throwing hardware at the delivery of  office services.  One part of that equation is anti-spam, and a while ago, I wrote about migrating from spamassasin, which is a resource hog, to dspam, which isn't. I then had to go back to spamassassin as dspam succumbed to bitrot and was no longer supported on Debian/Raspbian, before I find ways of running the venerable bogofilter as a server-based anti-spam solution. The previous posts are here and here.

While the initial setup of bogofilter against postfix and dovecot was not especially easy, as documentation is thin on the ground, we have been impressed with its accuracy, and its resource use is so light as to be mere noise rather than even registering in top or htop.  One design choice was whether to use the original Berkeley database or to use the more modern sqlite. Now I deeply distrust Berkeley database.  This is due to an incident 20 years ago when I ran Cyrus IMAP as an imap server in a company, and had to become quite adept at database recovery, as Cyrus used Berkeley db as its message store.  When dovecot matured, with the option of using maildir, I migrated that company as quickly as I culd to get away from the database trouble.  So I have shied away from Berkeley ever since.  Meanwhile, I find that there are quite a few instances of sqlite in my life, and none seem to give trouble.  As bogofilter can be installed under Debian in either its db or sqlite versions, I plumped for the sqlite option.

I was checking over the weekly database backups when I noticed that the bogofilter words list was 107Mb in size. I wondered how fragmented it was likely to be, and soon found a utility I had never bothered to use before, bf_compact. How hard could it be to run this?  You run it against the folder in which the database and other files if you run other features of bogofilter, and there are version for Berkeley db and sqlite. I chose the sqlite version and it ran for a couple of minutes.

Then the error messages started.

Fortunately the utility's first job is to create a new version of the wordlist folder, so the .old directory still had the original files. I quickly copied that over, to get things working again, and took to search engines to find what went wrong.  Nothing much was available, but one post did note simply to use the native sqlite vacuum command. In my case, this was simply:

sqlite3 wordlist.db 'VACUUM;

which reduced the file from 107Mb to 85Mb.  But more to the point it reduced the average delay caused by running an email through bogofilter from around 4-5 second to 2-3 seconds.  This was the first time doing any maintenance at all to the database since installing the system around July or August last year.

All in all, I still think bogofilter is a very good anti-spam solution for servers, whereas its most common usage is at client level. I have it configured to maintain a word list globally, but it can be set up, perhaps it's best set up, to use per-user wordlists.  This will allow it to be trained to an individual's requirements.  I once worked at a company where a good subset of users had legitimate requirements to send and receive emails with words that would definitely trigger conventional anti-spam systems. Bogofilter would have allowed those individuals to have their own anti-spam database which might have been more effective.

How effective is this? Well we have had an average of around 2000-3000 spam attempts a month, with some peaks and some troughs, and excluding smtp probe attempts, which are around 10 times that number, many of which are blocked by fail2ban. The postfix defences are pretty good, and it is rare for a spam message to get through both postfix and bogofilter. I would estimate about 2-5 a month at most when spammers change their techniques, but months can go by without ever seeing spam, other than those correctly classified and automatically put into the spam folder. These are moved to the user's junk folder and are automatically reclassified, helping with learning.

Addendum: I have now moved the root of the server to an SSD. This has further improved the performance of bogofilter. Regarding new spam types, it seems to take two examples of "learning" by moving spam into the spam folder, should new foms of spam hit the inbox, for it and similar variants to be correctly classified.