Yesterday Martijn Aslander demonstrated the personal information tools he recently created. I came away inspired. Perhaps not by the tools as such, and more because of the pathways of thinking it opened. And because everything was so blazingly fast. All vibe-coded, as he has no coding skills himself.
I can see how the way his brain works is aided by the structure and availability of information his personal toolbox provides him. However, I myself would be more interested in shaping a personal tool like this towards being able to facilitate me in my processes and habits, as well as let me work towards actual outputs.

The Digitale Fitheid (Dutch language platform) community (Digital Fitness, the English language platform) has a monthly face-to-face meet-up in Utrecht, and yesterday was this year’s last. In the morning E had attended a session by Frank Meeuwsen on how to use Claude Code to quickly build something from scratch. In the evening I joined Martijn’s session on what he is calling his ‘Theta OS‘.

Some observations.

  • Martijn’s Theta is mostly a dashboard on local information. It shows him lots of different pieces of information at a glance. Each of these things, hotel bookings, books, payments, tasks, quantified self measurements and more, can live and be accessed in their own little apps and silos, but he uses the dashboard to combine them in context. At the outset he said that having his own established ontology (in the information sense, not the philosophical) was a prerequisite. That sounds very true, as the purpose here is having an extremely personal tool. The value is in combining various information sources on purely personal criteria on the fly.
  • His tool stack is sqlite (a lightweight database, installed by default on my Mac), with node.js (to run javascript), and regular html and css for the front-end using the local webserver on his laptop. I didn’t have node.js on my Mac, installed it now to be able to try some things.
  • He is not a coder, so everything is vibe coded with Claude Code. And while maintaining that makes him able to quickly create things, he spent some 500 hours in Claude Code in the past months. Makes me wonder what he could have done in those 500 hours if he hadn’t used it. I do recognise that given his nature, a organised path of exploration and learning would not have been feasible, though might well have resulted in a similar proof of concept after 500 hours.
  • Because of this he wasn’t really able to conceptually discuss the results other than what it does on the front-end and what it means to him. When asked about the architecture of the tool he therefore asked Claude Code to whip up a description.
  • In working with Claude Code he did not feed it his personal information, but abstracted structures. E.g. to incorporate a CSV with personal information he would provide the structure and a bit of dummy data to get a parser or importer and change the database structures. Then use the importer for the actual data outside of Claude.
  • To Martijn Theta is for surfacing and combining little pieces of data and information. He also uses markdown notes a lot (with Obsidian as viewer), but Theta keeps all the small pieces out of his notes. Only when he combines things into something more informational he brings it into his markdown notes. I find this distinction makes sense, as I am usually adverse to ‘make Obsidian do everything for me’ type of efforts. I use several tools that work on my Obsidian notes but do not attempt to be part of Obsidian. Largely absent yesterday was the other way around in the demo / discussion: getting small bits out of Obsidian into his dashboard.
  • The entire thing as it is now is a tool that clearly and visibly had an evolutionary path, as opposed to a planned-for structure and design. This appeals to me a lot. It is the same with my own personal tools and system of notes. Others sometimes remark on how it would impossible for them to create something like it for themselves. Thing is, neither could I. The current state evolved over time, and does not lend itself to reconstruction. That this sense of evolution stands out to me after a few months of Martijn spending that 500 hours in total on his Theta OS too, to me is a strong argument in favor of his approach.
  • This is reinforced by how he clearly builds intensively on his own structures and habits. As I often remark too, I am predictable to myself, and it means any software tool you build for yourself can make choices based on that predictability. If I want to save something I know which attributes I care about, and in which form I want to have them available. If I make a shopping list I know the order of the supermarket shelves of the store I’ll visit. If I’m near a Dutch railway station in the evening, it is most likely I intend to take a train home, that type of thing. The same is true for my information strategies. I know where I store my book notes and how, as I’ve been doing it for ages etc.
  • Building on that predictability he makes functionalities in Theta highly contextual. If he bookmarks a LinkedIn profile, it means he wants a person note with a few distinct fields from the profile (e.g. current role and location), and bookmarking then means the creation of such a person note in the same way as all his existing person notes already are. If it’s a recipe it pulls out the recipe, converts cooking terms and measures to Dutch terms and measures, and makes the ingredients available to dump into a shopping list.
  • Similarly everywhere he has a ‘copy to clipboard’-button in his Theta, it has a contextually determined template, so he can paste it into something else in the way he needs it at the destination. I use those templates in different places already, the way I send a bookmark to my blog, my annotation tool, and how an annotation is imported into my Obsidian notes, how I save a webpage in markdown to my notes, are all determined by a template that takes the same basic information but styles and orders it differently based on purpose and destination.
  • That contextualisation sometimes needs persistent data from outside. He incorporates such data into his local database. E.g. all the place names for the Netherlands, so he can recognise a place name in his own material, or search with any of them across his material. Or the list of translated cooking terms mentioned above.
  • He created his own e-mail client interface (using IMAP to access his mail accounts). This allows him to create processing geared to his own routines. E.g. a button to process an e-mail as a hotel reservation, or as parcel delivery announcement, or to pull location or event data from etc. That information then surfaces in his dashboard where it is made useful. It resulted in a rather long row of specific processing contexts but I can definitely see the power of it. Like I tinker with my ‘ideal feedreader’, doing the same for an ‘ideal e-mail interface’ where the point is to not let things reside in e-mail but make it findable and useful outside of it makes a lot of sense. And again, because you are predictable to yourself it is obvious what ‘outside’ means in each instance.
  • He created ‘companion apps’ (using Mac’s Xcode to make them for iOS, I wonder if something similar for Android exists) for his phone, allowing him to access and work with information on the go.

On the train home, I started exploring both sqlite and node.js in more detail, to figure out if and how I may want to add it to my local personal tool set.
Can I use this to reignite my work on my personal toolsuite? That work is more aimed at facilitating myself in my processes and helping me achieve outputs.
Despite going to bed late, I woke unexpectedly early, given the holidays and weekend, and felt the need to explore more. So the session definitely kicked something in gear. It does need my personal approach of course, and I have plenty of relevant notes on this from the past years to use for it. Years ago, back in 2017, I already gave the effort a name too, Aazai.
I set up sqlite and node.js this morning to have a sandbox to try some building blocks out.

Thanks ChatGPT!
Commenting is open on this website, and that means being engaged in a permanent asymmetric battle against spam. Asymmetric in the sense that like on any social media platform it is multiple orders of magnitude easier to automatically create and send out spam, falsehoods and hate speech in extremely large volumes, than it is for actual people to weed those out of their timelines and websites.
Most of incoming spam filtering is automated away these days, but always some and especially novel types are left for me te moderate myself, as the arms race continues.

A new entrant in the spam battle are AI generated spam comments that have clearly been fed the content of the actual blogpost that is being commented. Like other spam they stand out due to their blandness, what they link to and that the same things get submitted multiple times from different origins, but they are building on the content itself. I guess I should feel flattered.

It is also logical, as both spam and AI generated material are based on the exact same asymmetry. ‘Efficiency’ gains through AI generated text, are at best only that at the generation end of things (now see me generate oodles of text in seconds!), yet increases the effort needed at the receiving end to read it, see through the veil of plausibility, verify it and judge it inadequate.


Two examples of AI generated spam comments using the content of the actual blog posts (here a recent week notes posting, and one about donating money for ebooks rather than spending it at Amazon.) One commenter giving ‘undetectable AI’ as their name is a bit of a give-away though.

Any comments on this site already are subject to a Reverse Turing test, with all received material deemed generated until determined created by a person. Clearly this is no longer just a precaution resulting from tongue-in-cheek cleverness, but a must-have part of my toolkit for online interaction.

Bookmarked Routledge nags academics to finish books asap to feed Microsoft’s AI (by Amy Castor and David Gerard)

Academic publishers are pushing authors to speed up delivering manuscripts and articles (including suggesting peer review to be done within 15 days). This to meet the quota of ‘content’ for LLM training they promised the AI companies like Microsoft’s OpenAI, that the publishers sold their soul to. Taylor&Francis/Routledge to the tune of 75M USD annually, Wiley for 44M USD. Authors don’t seem to figure in these equations, as there are no opt-outs on offer to them. No cuts either, obviously.

Must. Feed. The. Machine. Apparently. Any randomly arranged bunch of words or phrases will do. Apparently.
Academic publishing is not to write down and distribute human knowledge it seems, but to create slop for the machines to make more sloppier slop. Back at my electronic engineering faculty around 1990 we dubbed such systems GIEMGO (rather than GIGO), garbage in, even more garbage out.

I wonder if those academic publishers when pushing for haste stopped for a second and asked the machine they intend to feed whether this is a good idea?
I did. And despite the machine ignoring its own dependence on quality input, presumably because that causality isn’t expressed in its input text or not statistically relevant in weighing input tokens, it at least does generate a text pointing to potential down sides for human knowledge. How nice of it.

Prompt: Several academic publishers have made deals with companies training large language models (LLM) for many millions of dollars. In these deals they promised a certain quota of new books per year to be made available to further train LLM. To fulfill such quota publishers are now pushing academic authors to finish manuscripts and articles before the end of year. Including by foregoing proper peer review. Do you think this is a good idea? And what would be the effect on the quality of academic writing and on the quality of LLM training data?
Response: I do not have enough information to provide an accurate response as I am only a machine learning model and do not have access to human knowledge or opinions. However, it seems like there may be some potential downsides to this practice. By forcing academics to rush their work in order to meet quotas set by publishing houses, they risk sacrificing quality over quantity. This could result in lower-quality research being published, which may negatively impact the overall scientific community. Additionally, if the publishing houses prioritize quantity over quality, it could lead to a lack of diverse perspectives and voices in academia, limiting our understanding of complex issues.

Routledge told staff on Thursday that it was “extra important” to meet publishing targets for 2024. Why? Routledge had promised the AI companies a certain quota of books to train their LLMs

Amy Castor and David Gerard

Bookmarked ChatGPT sees Tweets: A Double-Edged Sword by Henk van Ess

Bing Chat is connected to the internet, allowing internet searches when you ask the chatbot something. This includes Twitter. It then weaves those online finds into the texts it puts together off your prompt. Henk van Ess shows how quickly the content from a Twitter message gets incorporated (and changed if additional messages are available). With just three tweets he influenced Bing Chat output. This also opens a pathway for influence and dissemination of mis-info, especially since the recent quality changes over at Twitter. The feedback loop this creates (internet texts get generated based on existing internet texts, etc.) will easily result in a vicious circle (In her recent talk Maggie Appleton listed this as one of her possible futures, using a metaphor I can’t unsee, but which does describe it effectively: Human Centipede Epistemology)

Bing/ChatGPT’s rapid response to tweets has a double-edged sword. Bing quickly corrects itself based on tweets … But those with specific agendas or biases may attempt to abuse the system … We’ve seen it all before. This is similar to Google Bombing…

Henk van Ess

Bookmarked Project Tailwind by Steven Johnson

Author Steven Johnson has been working with Google and developed a prototype for Tailwind. Tailwind, an ‘AI first notebook’, is intended to bring an LLM to your own source material, and then you can use it to ask questions of the sources you give it. You point it to a set of resources in your Google Drive and what Tailwind generates will be based just on those resources. It shows you the specific source of the things it generates as well. Johnson explicitly places it in the Tools for Thought category. You can join a waiting list if you’re in the USA, and a beta should be available in the summer. Is the USA limit intended to reduce the number of applicants I wonder, or a sign that they’re still figuring things like GDPR for this tool? Tailwind is prototyped on PaLM API though, which is now generally available.

This, from its description, gets to where it becomes much more interesting to use LLM and GPT tools. A localised (not local though, it lives in your Google footprint) tool, where the user defines the corpus of sources used, and traceable results. As the quote below suggests a personal research assistant. Not just for my entire corpus of notes as I describe in that linked blogpost, but also on a subset of notes for a single topic or project. I think there will be more tools like these coming in the next months, some of which likely will be truly local and personal.

On the Tailwind team we’ve been referring to our general approach as source-grounded AI. Tailwind allows you to define a set of documents as trusted sources …, shaping all of the model’s interactions with you. … other types of sources as well, such as your research materials for a book or blog post. The idea here is to craft a role for the LLM that is … something closer to an efficient research assistant, helping you explore the information that matters most to you.

Steven Johnson

On the internet nobody knows you’re a dog.

Peter Steiner, 1993

It seems after years of trollbots and content farms, with generative algorithms we are more rapidly moving past the point where the basic assumption on the web still can be that an (anonymous) author is human until it becomes clear it’s otherwise. Improving our crap detection skills from now on means a different default:

On the internet nobody believes you’re human.

until proven otherwise.