Donald Clark writes about the use of voice tech for learning. I find I struggle enormously with voice. While I recognise several aspects put forward in that posting as likely useful in learning settings (auto transcription, text to speech, oral traditions), there are others that remain barriers to adoption to me.

For taking in information as voice. Podcasts are mentioned as a useful tool, but don’t work for me at all. I get distracted after about 30 seconds. The voices drone on, there’s often tons of fluff as the speaker is trying to get to the point (often a lack of preparation I suppose). I don’t have moments in my day I know others use to listen to podcasts: walking the dog, sitting in traffic, going for a run. Reading a transcript is very much faster, also because you get to skip the bits that don’t interest you, or reread sections that do. Which you can’t do when listening, because you don’t know when a uninteresting segment will end, or when it might segue into something of interest. And then you’ve listened to the end and can’t get those lost minutes back. (Videos have the same issue, or rather I have the same issue with videos)

For using voice to ask or control things. There are obvious privacy issues with voice assistants. Having active microphones around for one. Even if they are supposed to only fully activate upon the use of the wake-up word, they get triggered by false positives. And don’t distinguish between me and other people that maybe it shouldn’t respond to. A while ago I asked around in my network how people use their Google and Amazon microphones, and the consensus was that most settle on a small range of specific uses. For those it shouldn’t be needed to have cloud processing of what those microphones tape in your living room, those should be able to be dealt with locally, with only novel questions or instructions being processed in the cloud. (Of course that’s not the business model of these listening devices).

A very different factor in using voice to control things, or for instance dictate is self-consciousness. Switching on a microphone in a meeting has a silencing effect usually. For dictation, I won’t dictate text to software e.g. at a client’s office, or while in public (like on a train). Nor will I talk to my headset while walking down the street. I might do it at home, but only if I know I’m not distracting others around me. In the cases where I did use dictation software (which nowadays works remarkably well), I find it clashes with my thinking and formulation. Ultimately it’s easier for me to shape sentences on paper or screen where I see them take shape in front of me. When dictating it easily descends into meaninglessness, and it’s impossible to structure. Stream of thought dictation is the only bit that works somewhat, but that needs a lot of cleaning up afterwards. Judging by all podcasts I sampled over the years, it is something that happens to more people when confronted with a microphone (see the paragraph above). Maybe if it’s something more prepared like a lecture, or presentation, it might be different, but those types of speech have been prepared in writing usually, so there is likely a written source for it already. In any case, dictation never saved me any time. It is of course very different if you don’t have the use of your hands. Then dictation is your door to the world.

It makes me wonder how voice services are helping you? How is it saving you time or effort? In which cases is it more novelty than effectiveness?

This reads like what I have been discussing with Peter. A voice system that is not beholden to a tech giant and processes data locally.

(I do still need to change the way this bookmark is presented in the blog)

Bookmarked Update 23: The Mycroft Personal Server - Starting the Conversation · Mycroft Mark II: The Open Voice Assistant by an author (Kickstarter)
From our CTO, Steve Penrod:  In my July post where I introduced the Mycroft Roadmaps, I laid out plans for a Mycroft Personal Server. I’ve had conversations with many about the concept, but exact goals and designs haven’t been established yet. I see this as a highly Community-centric project, so I’d like to start a conversation so we can all get on the same page.  What is it?  Mycroft is inherently modular, allowing pieces of the system to be moved around as appropriate for the specific implementation. Up to this point, the typical implementation runs the majority of Mycroft on a device such as a Mark 1, Raspberry Pi, or laptop. This includes Wake Word processing, intent parsing, and text to speech(TTS) (more on Mimic2 TTS below).  For normal operation, there is one critical piece isn’t included in that list -- Speech to Text (STT). The typical Mycroft device today uses Mycroft’s Home to perform the STT operation. This is automatic and invisible to most users.   Mimic2 for those using this new voice technology   General user settings   Web interface to specific Skill Settings   The Marketplace to find and install new skills   In my view, the Personal Server would provide some version of all of these services. It should allow a household to run all of their Mycroft equipment without any network necessary until a Skill needs to access the internet to retrieve information.  This means the personal server would at minimum need to run Speech to Text (DeepSpeech), Text to Speech (Mimic), and provide a configuration web interface.   Why would anyone need this?  There are several very good reasons for implementing this capability.   Slow, unreliable internet - I’m personally spoiled by Google Fiber here in Kansas City and forget that not everyone in the world has gigabit connection speeds.   Limited or expensive internet - Similar to the above, but slightly different motivation   No internet - Yes, this exists. Imagine locations in the mountains, on boats, or the far side of the moon.   Privacy concerns - Every time data leaves your control, there is a possibility of others misusing it, not safeguarding it adequately, or it being intercepted.  For those willing to accept the responsibility, keeping all operations within a home provides the ultimate in reliability and security.  What a Personal Server Isn’t The Personal Server is intended to be Personal -- not Enterprise Grade. The main reason for this is simplicity. For example, if you don’t have to perform Speech to Text requests for thousands of users, the odds of collision are very low. That means STT requests can be run sequentially instead of requiring a bank of STT servers that can handle a high load.  A Personal Server also isn’t for everyone. You don’t have to be a High Geek but it will require some significant computational resources, like an always-on PC with a high-quality GPU.  Does this mean Home is no longer needed? No, for several reasons. Firstly, many people will still want the convenience of just plugging in their device and running it. No worries about setting up a server, no challenges accessing the web UI from their phone without firewall magic, etc. It just works.  Second, there is still value in having a central collaboration hub. Mycroft has always been about communal efforts, and Community requires gathering places. Home provides a place to:   Share and assist in tagging data to advance the technology   Discover new information   Download voices, and skills from others  Provide a gateway to access other Mycroft devices and Mycroft-hosted services  Your Thoughts? All of the above are my thoughts. But as I said at the beginning, I want this to be a conversation. What do you want and see for the Mycroft Personal Server? Are there concerns I’m overlooking? Would you like to be involved in the building of this, taking control of your own fate?  Please join us on our Community Forum to participate in the conversation!