As I am moving out of Gmail, I had to find a way to deal with the 21GB mail archive from the past 12 years.

Google lets you export all your data from its various services, including email. After a day or so you get a download link that contains all your mail in one single file in MBOX format.

MBOX is a text format so it allows itself to be searched, but that would only tell you that what you are looking for is somewhere in that 21GB file.

I could also import it into my mail client as a local archive, by dropping the MBOX file in the Local Folder of Thunderbird with Finder. That provides me with a similar access and search capability as I had for all that mail in Gmail. However, if I would like to do more with my archive, mine it for things, and re-use stuff by piping it into other workflows having it in Thunderbird would not be enough.

Mailsteward puts MBOX into MySQL
So I searched for a way to more radically open my archive up to search. I came across DevonThink, but that seemed a bit overkill as it does so much more than merely digesting a mail archive, and as such provides way too much overlap with my Evernote. (Although I may rethink that in the future, if I decide to also move out of Evernote, as after Gmail it is my biggest third party service that contains lots of valuable information.) I looked for something simpler, that just does what I need, putting e-mail into sql, and that is how I found Mailsteward Pro.

There are three versions of Mailsteward, and I needed the Pro version, as it is the one that works with MySQL and thus can handle the volume of mail in my archive. It costs $99 one time, not cheap, but as I was paying for storage with Google as well, over time it pays for itself.

Installing Mailsteward
When installing Mailsteward it assumes you already have a MySQL server running on your system. I use MAMP Pro on my laptop as a local web and mysql server, on which I run different things locally, like a blog based journal and a self-assessment survey tool. MAMP Pro is very easy to install.

You need to take the following steps to allow Mailsteward access to MySQL. In MAMP Pro you need to allow external access to MySQL, but only from within your own system (this basically means applications other than MAMP can access the MySQL server.

Schermafbeelding 2016-07-19 om 16.37.07

Then you create a new database via the PHP Mysqladmin that comes with MAMP. Mailsteward will populate it with the right tables. In my case I aptly named it mailarchives.

Schermafbeelding 2016-07-19 om 10.48.16

Within Mailsteward you then add a connection, listing the database you created, and adding the right ports etc. Note that the socket it requests isn’t an actual file on your system, but does need to point to the right folder within the MAMP installation, which is the Application/MAMP/tmp/mysql folder.

Schermafbeelding 2016-07-19 om 08.41.51

Importing MBOX files
I first tested Mailsteward with my parents e-mail archive that I kept after they passed away last year, to be able to find contact details of their friends. It imported fine. Then I tried to import my Gmail MBOX file. It turns out 21GB is too large to handle in one go for Mailsteward, as it eats away all memory on your Mac. I concluded that I need to split my Gmail MBOX file into multiple smaller ones.

Luckily there is a working script on GitHub that chops MBOX files up in smaller ones, and that allows you to set the filesize you want. I chopped the Gmail MBOX into 21 smaller files of 1GB each. These imported ok into MailSteward. Mailsteward maintains tags and conversation threads.

To run the script, first open it in a text editor and change the filesize limit to what you want (default is 40MB, I changed it to 1GB). Then open Terminal and run the script by typing the following command, where the destination folder does not need to exist:

sudo php mbox_splitter.php yourarchivename.mbox yourdestinationfolder

terminalcommand

That way you end up with a folder that contains all the smaller MBOX files:

Schermafbeelding 2016-07-22 om 16.06.53
Using Mailstewards import feature you then add each of those files, by hand (but luckily you only need to do that once).

Using the archive
Mailsteward allows you to search the archive through its rather simple and bland interface, but you can also tweak the MySQL queries it creates yourself. The additional advantage of having it in MySQL is that I can also access the archive with other tools to search it.

Schermafbeelding_mailsteward

Adding newer mail to the archive
Thunderbird allows me to export e-mail as MBOX files via the Import/Export add-on, which can then be added to the archive by Mailsteward. So that’s a straightforward operation. Likely I can automate it and schedule it to run every month.

Leaving Gmail, a tough question
In the past two years I have been slowly reconfiguring my online routines to increase privacy safeguards, and bring more of my data under my own control, while avoiding making my work routines more difficult and thus less routine. How to create an e-mail workflow that does not rely on Gmail has been the hardest part of this effort. I think I now finally have figured out how to do it without loss of convenience, and hope to have made the switch after I finish exporting all e-mail data Google has from me.

mailinbox
After 12 years this will no longer be a familiar sight for me

Previous steps I took
Some things I already did to increase my control over my own data are:

Not that I don’t use anything but my own stuff now, I also am still a heavy user of various services, like Evernote for instance, or my Android phone. But the usage of third party services has become more varied and spread-out, reducing the impact of losing any one of them.

Why I want to leave Gmail
The net is a distributed place, and our information strategies and routines should embrace that distributedness. In practice however we often end up in various silos and walled gardens, because they are so very convenient to use, although they actually decrease our own control and/or introduce single points of failure. If your Facebook account gets suspended can you still interact with others? If your Google account gets suspended, do you still know how to reach people? Using Gmail also means all of my stuff resides on servers falling under the not very privacy sensitive US laws.

Since July 2004 I have however completely relied on Gmail. It is an easy way to combine the various e-mail addresses I use into 1 single inbox ( or rather multiple inboxes on the basis of follow-up actions), and it has great tagging, search and filtering so that you never need to file anything or sort into folders. I have used Gmail as my central inbox for everything. Since 2004 I have accumulated about 770.000 emails in 249.000 conversations, for a total of 21GB. Gmail is therefore the largest potential single point of failure in my information processing.

The issues to solve
To wean myself off Gmail there were several things for which I needed a similarly smooth working alternative:

  • All the mail addresses I use need to come together into a single mailbox, and conversations need to be threaded
  • Availability across devices, and via webmail. Especially on the road I use my phone for quick e-mail triage, and as alternative for phone calls. Webmail is my general purpose access point on my laptop while traveling
  • Having access to my full mail archive for search and retrieval
  • Excellent tagging and filtering possibilities

The steps I took to leave Gmail
Finding a path away from Gmail took two realisations, one about process and one about technology.

Changing my process
Concerning process I realized that Gmail allows me, or even invites me, to be very lazy in my e-mail processing routines. Because of the limitless storage I merely needed to be able to find things back (through the use of tags for instance), and never needed to really decide what to do with an e-mail.

This means for instance that lots of attachments only live on in my mailbox, without me adding them to relevant project documentation etc. Likely I spent hours in the past years searching for slide decks in my mountain of e-mail, in stead of spending half a minute once to store and archive an attachment in a more logical place where I’m more likely to find it with desktop search, or serendipitously bump into it, and then throw the mail message out. So mail processing has to become a much less lazy process with a few more active decisions in handling messages. E.g. attachments into a project folder, contact info into contacts, book keeping related messages to bookkeeping (and no longer going through all mail tagged bookkeeping every quarter to do my taxes), tasks and actions to my Things todo application. I already wrote several Apple Scripts to let my todo app and Evernote talk to various other software packages (like Tinderbox), but it is now likely I will write a few more to automate mail message processing further (because I prefer to still keep my process as lazy as possible).

Changing my tools
A second key realization was that my original reasons for staying within webmail had meanwhile been solved with better technology: it used to be that only Gmail provided the cross-device access to all my mail accounts simultaneously, something I could not easily do in 2004 with a desk/laptop mail client in combination with a mobile mail client. Now, with much broader IMAP support (not just by my software tools, but also by hosting companies) this is much easier, increasing the range of possible alternatives. Threading mail conversations is now also a more universal feature.

This now allowed me to start using Thunderbird mail client, including PGP encryption, on my laptop (I never intensively used a mail client before on my laptop), in combination with the open source K9 Android mail app (replacing the Gmail app for me), also with encryption options. Both allow tagging of messages, and Thunderbird allows filtering for not just incoming mail but also when sending and when archiving, which is really useful.

As an alternative to piping all my mail accounts into Gmail, I now use all the real inboxes of those mail accounts where they’re originally hosted, and use IMAP to combine into one user interface on my laptop and mobile. Those separate mailboxes do have lower storage limits (usually 500MB), so it is more likely I bump into limits, and that is the reason I need a much less lazy mail processing routine (especially concerning larger attachments), in which I can regularly archive older mail.

Separately I also now use a different webmail provider, Protonmail in Switzerland, that comes with default encryption. I’ve attached a domain name to it (zylstra.eu).

The archiving issue
The above shows how leaving Gmail moving forward from the here and now, by solving the one-inbox and the multiple device issues can be done by changing process and tools. That leaves the question of how to deal with the 21GB of mail archive from the past 12 years. Leaving it all in Gmail, and use that as archive might be a work-around for old mail, but doesn’t help me for future mail. I could add it as a local folder to the Thunderbird mail client, but that thought did not appeal to me and feels clunky. I find that I never use my mail archive from my mobile, so the archive does not need to be cloud based per se. So, I opted to keep my mail archive local, by storing it in a mysql database. This allows for query based searches, and even text mining, without it clogging up my mail client itself. Gmail can export your archive in a single MBOX file, and I used Mailsteward Pro to transform it into a mysql database. (More on that set-up in the next posting Archiving mail in mysql with MAMP and Mailsteward). With the archive now locally stored, the database is backed up to both my NAS drive and my VPS.

What remains
With the basic set-up for leaving Gmail now in place, there is still work te be done over the coming months. Clearing out the archive at Gmail is one step, once I feel comfortable with searching my new mysql archive. Creating more filters in my mail client, and writing a few scripts to integrate my mail processing with the other tools I use is another. There are also likely a whole bunch of things (accounts, subscriptions etc) that use my gmail address, which I will change as I go along.

My longtime blogging friend Roland Tanglao suggested to mine my mail archive for things that could be published, contact data, harvest old ideas that can feed into my work now etc. This sounds appealing but needs some contemplation and then a plan. Having the archive in mysql makes it a lot easier to come up with a plan though.

Beyond mail, there are of course more Google services I use heavily, especially Calendar, which are tied to my gmail address. I could move that to my Owncloud as well. I will keep my Google account, as this isn’t about ditching Google but about reducing risks and taking more control. Apart from Calendar there are no other single points of failure in the way I use my Google account. Beyond Google, Evernote is another silo I’m heavily invested in, and the content I keep there is arguably more valuable to me than my Gmail. So that is a future change to think about and seek alternatives for.

Inbox 0 is for Losers
I reached Inbox -1 on Gmail once in 2009 🙂

[Find the outline and slides of my Koppelting session on leaving Gmail in the follow-up posting at https://tzyl.eu/leavegarden. You can use the shortlink https://tzyl.eu/gmail to refer to this posting.