As I am moving out of Gmail, I had to find a way to deal with the 21GB mail archive from the past 12 years.
Google lets you export all your data from its various services, including email. After a day or so you get a download link that contains all your mail in one single file in MBOX format.
MBOX is a text format so it allows itself to be searched, but that would only tell you that what you are looking for is somewhere in that 21GB file.
I could also import it into my mail client as a local archive, by dropping the MBOX file in the Local Folder of Thunderbird with Finder. That provides me with a similar access and search capability as I had for all that mail in Gmail. However, if I would like to do more with my archive, mine it for things, and re-use stuff by piping it into other workflows having it in Thunderbird would not be enough.
Mailsteward puts MBOX into MySQL
So I searched for a way to more radically open my archive up to search. I came across DevonThink, but that seemed a bit overkill as it does so much more than merely digesting a mail archive, and as such provides way too much overlap with my Evernote. (Although I may rethink that in the future, if I decide to also move out of Evernote, as after Gmail it is my biggest third party service that contains lots of valuable information.) I looked for something simpler, that just does what I need, putting e-mail into sql, and that is how I found Mailsteward Pro.
There are three versions of Mailsteward, and I needed the Pro version, as it is the one that works with MySQL and thus can handle the volume of mail in my archive. It costs $99 one time, not cheap, but as I was paying for storage with Google as well, over time it pays for itself.
Installing Mailsteward
When installing Mailsteward it assumes you already have a MySQL server running on your system. I use MAMP Pro on my laptop as a local web and mysql server, on which I run different things locally, like a blog based journal and a self-assessment survey tool. MAMP Pro is very easy to install.
You need to take the following steps to allow Mailsteward access to MySQL. In MAMP Pro you need to allow external access to MySQL, but only from within your own system (this basically means applications other than MAMP can access the MySQL server.
Then you create a new database via the PHP Mysqladmin that comes with MAMP. Mailsteward will populate it with the right tables. In my case I aptly named it mailarchives.
Within Mailsteward you then add a connection, listing the database you created, and adding the right ports etc. Note that the socket it requests isn’t an actual file on your system, but does need to point to the right folder within the MAMP installation, which is the Application/MAMP/tmp/mysql folder.
Importing MBOX files
I first tested Mailsteward with my parents e-mail archive that I kept after they passed away last year, to be able to find contact details of their friends. It imported fine. Then I tried to import my Gmail MBOX file. It turns out 21GB is too large to handle in one go for Mailsteward, as it eats away all memory on your Mac. I concluded that I need to split my Gmail MBOX file into multiple smaller ones.
Luckily there is a working script on GitHub that chops MBOX files up in smaller ones, and that allows you to set the filesize you want. I chopped the Gmail MBOX into 21 smaller files of 1GB each. These imported ok into MailSteward. Mailsteward maintains tags and conversation threads.
To run the script, first open it in a text editor and change the filesize limit to what you want (default is 40MB, I changed it to 1GB). Then open Terminal and run the script by typing the following command, where the destination folder does not need to exist:
sudo php mbox_splitter.php yourarchivename.mbox yourdestinationfolder
That way you end up with a folder that contains all the smaller MBOX files:
Using Mailstewards import feature you then add each of those files, by hand (but luckily you only need to do that once).
Using the archive
Mailsteward allows you to search the archive through its rather simple and bland interface, but you can also tweak the MySQL queries it creates yourself. The additional advantage of having it in MySQL is that I can also access the archive with other tools to search it.
Adding newer mail to the archive
Thunderbird allows me to export e-mail as MBOX files via the Import/Export add-on, which can then be added to the archive by Mailsteward. So that’s a straightforward operation. Likely I can automate it and schedule it to run every month.