In a tweet Mike Haber mentioned Otter.ai, a spoken text transcription tool, in the context of making notes (in Obsidian.md). Taking a look at the Otter.ai website I tried to create an account, only to be told that the unique email address I entered was already tied to an existing account. Indeed, my 1Password contained a login that I created in March 2018, but never used. Despite, or maybe because of, the friction I feel using audio, I decided to try it out now.
I tried three things.
One, where I spoke to my laptop, while seeing the transcription written out live in front of me. This worked well, but creates odd feedback loops of self-consciousness when I read back my own words while speaking them. It’s like using a mirror to guide your hand movements, but then for speech.
Two, where I recorded myself talking using QuickTime and uploaded the resulting sound file. This removed the strange feedback loop of seeing the text emerge while talking, but had me sitting behind my laptop and manually uploading a file afterwards.
Three, where I used the service’s Android-app to dictate to my phone while walking around the house. This felt the most natural of the three.
Resulting transcripts can be manually exported in various formats from the browser interface, including flat text and to the laptop’s clipboard. An automatic export in txt would be nice to have. Otter.ai only does English (and does it well), which isn’t an issue when I’m in an English language context, but otherwise quickly feels artificial to my own ears.
From my brief tests three cases stood out for me that I can get comfortable with:
- dictating short ideas or descriptions while on the move around the house
- stream of consciousness talk, either while walking around the house or stationary
- describing an object as I handle it, specifically physical books as I first go through them to see what it is about, in preparation for reading.
Otter.ai has a generous free tier of 10 hours per month and three free uploads (I assume the idea behind that it is they get more data to train their algorithms with), but the next tier up (‘pro’) gives you ten times that per month and unlimited uploads within those 100 hours for $100USD / yr. That is I think a pretty good deal, especially compared to other services.
Differences between Otter.ai and other services I found online concern 1) real time audio capture and transcription, where others mostly just provide for uploads of audio files, 2) costs, where others charge by the minute and/or generally charge much more, 3) available languages, where Otter.ai only provides English, and others cater to a wide range of languages.
All the services I looked at allow listening to audio while you go through a transcription, e.g. to add corrections.
Two European services I found are Amberscript (a Dutch company), which has a prepaid option of 15 Euro / hour (or 40 Euro per month subscription for 5 hours), and Happyscribe (a French company) which charges by the minute at 12 Euro / hour.
There is of course also the dictation built into Microsoft Word. Word supports Dutch well. Although I normally work in LibreOffice, I do have Word installed to prevent weird conversion issues working on documents with clients who run MS products. It does mean being tied to the laptop while dictating though, and of course like any other US company, including Otter.ai, all audio goes to US servers for speech recognition. Also, after the dictation there’s no audiofile left over, only the document remains. It means that odd transcriptions can remain a mystery, because you can’t go back to the original. You should do such corrections immediately in that case. After such a correction phase this is no longer an issue, then it’s just a difference with other services that are designed more towards transcription of e.g. interviews, where MS Word is geared towards dictation. In the web based Word version there’s a transcription feature separate from the dictation feature, that provides 300 minutes for free per month and does retain the audio file for you.
For now I will aim to experiment with voice dication some more. Probably for the first few days using MS Word on my laptop for dictation, and using Otter.ai’s mobile app for the same, in the three mentioned use cases. If I find it gets more useful than strange (as I’ve found it to be in previous years and attempts), I will likely use Amberscript, as it is EU based and has a mobile app. Their prepaid option of 15 Euro / hour is probably good for quite some time at first.
I really liked using otter.ai and burned through maybe 3 different google accounts hitting the limits. What makes it stands out is the transcription editor, so you can click words and hear the audio at that point.
Also check out http://rev.ai (I thought I read somewhere that they use otter under the hood) I used it on a spanish language podcast recording and it did well. They have up to 26 languages supported