I am working on following somewhat in Peter’s footsteps. He has his FreshRSS reader connected to his weblog, for everything he ‘stars’ in his feed reader. Those get posted to his ‘favourites’ stream, and send webmentions so they end up as ‘likes’ underneath the original posts. See Peter’s description on his blog.

My plans are a bit more ambitious, an ambition that may yet well bump into the limtis of my php / mysql capabilities.

I want to be able to mark articles in the feed reader for three things: bookmarking, favouriting, and replies. Then I want to add two pieces of content, and post that to my site.

When I favourite something on my blog, I basically always add 2 things when posting it: a rationale for why I like something, and a quote from the original article that meant something to me. See the image below, a screenshot of a ‘favourite’ I marked earlier this month.

That post is made of a short template that I now activate when in my blog’s editor with a hotkey (using AlfredApp) that creates the star, and has a place for the URL, title and author, plus my own motivation for posting, and a quote with the author’s name repeated.
Favourites, bookmarks and replies follow the same template. The only difference is the symbol shown, and the microformat used to signal to the original article what sort of webmention I’m sending their way (so they know how to treat it, e.g. as a like or as a comment/reply)

Peter’s example of favouriting by starring leads to setting a boolean field in the content table itself of the database.
FreshRSS lacks three distinct ‘starring’ buttons, but I can easily add labels to a posting. Those labels are stored in a separate table from the feed articles, unlike when starring.

The first step to take then is to gain access to the postings I mark for follow-up during feed reading. I’ve used the following SQL query:

SELECT ton_entry.link, ton_entry.author, ton_entry.title, ton_entrytag.id_tag, uncompress(ton_entry.content_bin) as content FROM ton_entry JOIN ton_entrytag ON ton_entry.id=ton_entrytag.id_entry

The two key differences with Peter’s SQL statement are, the JOIN part, and the uncompress part. The JOIN statement combines the table that knows what labels I applied with the table that contains all articles, and selects only those where an article ID is in both tables, resulting in a list of the articles I applied labels to. Because in a later step I want to select a quote from the source article the SQL statement also grabs the content of an article. That content is stored as a compressed binary blob (yes, blob is its official name) in the database. Using ‘uncompress’ makes the content blob human readable again, and the ‘as content’ bit puts it in a variable called content.

The next step is allowing me to provide my remarks and select a quote through a form, and the third step to add all that in to the right template based on the label (favourite, bookmark, reply), after which I need to put it into my WordPress install in the right categories and publish it.

As I am moving out of Gmail, I had to find a way to deal with the 21GB mail archive from the past 12 years.

Google lets you export all your data from its various services, including email. After a day or so you get a download link that contains all your mail in one single file in MBOX format.

MBOX is a text format so it allows itself to be searched, but that would only tell you that what you are looking for is somewhere in that 21GB file.

I could also import it into my mail client as a local archive, by dropping the MBOX file in the Local Folder of Thunderbird with Finder. That provides me with a similar access and search capability as I had for all that mail in Gmail. However, if I would like to do more with my archive, mine it for things, and re-use stuff by piping it into other workflows having it in Thunderbird would not be enough.

Mailsteward puts MBOX into MySQL
So I searched for a way to more radically open my archive up to search. I came across DevonThink, but that seemed a bit overkill as it does so much more than merely digesting a mail archive, and as such provides way too much overlap with my Evernote. (Although I may rethink that in the future, if I decide to also move out of Evernote, as after Gmail it is my biggest third party service that contains lots of valuable information.) I looked for something simpler, that just does what I need, putting e-mail into sql, and that is how I found Mailsteward Pro.

There are three versions of Mailsteward, and I needed the Pro version, as it is the one that works with MySQL and thus can handle the volume of mail in my archive. It costs $99 one time, not cheap, but as I was paying for storage with Google as well, over time it pays for itself.

Installing Mailsteward
When installing Mailsteward it assumes you already have a MySQL server running on your system. I use MAMP Pro on my laptop as a local web and mysql server, on which I run different things locally, like a blog based journal and a self-assessment survey tool. MAMP Pro is very easy to install.

You need to take the following steps to allow Mailsteward access to MySQL. In MAMP Pro you need to allow external access to MySQL, but only from within your own system (this basically means applications other than MAMP can access the MySQL server.

Schermafbeelding 2016-07-19 om 16.37.07

Then you create a new database via the PHP Mysqladmin that comes with MAMP. Mailsteward will populate it with the right tables. In my case I aptly named it mailarchives.

Schermafbeelding 2016-07-19 om 10.48.16

Within Mailsteward you then add a connection, listing the database you created, and adding the right ports etc. Note that the socket it requests isn’t an actual file on your system, but does need to point to the right folder within the MAMP installation, which is the Application/MAMP/tmp/mysql folder.

Schermafbeelding 2016-07-19 om 08.41.51

Importing MBOX files
I first tested Mailsteward with my parents e-mail archive that I kept after they passed away last year, to be able to find contact details of their friends. It imported fine. Then I tried to import my Gmail MBOX file. It turns out 21GB is too large to handle in one go for Mailsteward, as it eats away all memory on your Mac. I concluded that I need to split my Gmail MBOX file into multiple smaller ones.

Luckily there is a working script on GitHub that chops MBOX files up in smaller ones, and that allows you to set the filesize you want. I chopped the Gmail MBOX into 21 smaller files of 1GB each. These imported ok into MailSteward. Mailsteward maintains tags and conversation threads.

To run the script, first open it in a text editor and change the filesize limit to what you want (default is 40MB, I changed it to 1GB). Then open Terminal and run the script by typing the following command, where the destination folder does not need to exist:

sudo php mbox_splitter.php yourarchivename.mbox yourdestinationfolder

terminalcommand

That way you end up with a folder that contains all the smaller MBOX files:

Schermafbeelding 2016-07-22 om 16.06.53
Using Mailstewards import feature you then add each of those files, by hand (but luckily you only need to do that once).

Using the archive
Mailsteward allows you to search the archive through its rather simple and bland interface, but you can also tweak the MySQL queries it creates yourself. The additional advantage of having it in MySQL is that I can also access the archive with other tools to search it.

Schermafbeelding_mailsteward

Adding newer mail to the archive
Thunderbird allows me to export e-mail as MBOX files via the Import/Export add-on, which can then be added to the archive by Mailsteward. So that’s a straightforward operation. Likely I can automate it and schedule it to run every month.