Bookmarked The Expanding Dark Forest and Generative AI by Maggie Appleton

I very much enjoyed this talk that Maggie Appleton gave at Causal Islands in Toronto, Canada, 25-27 April 2023. It reminds me of the fun and insightful keynotes at Reboot conferences a long time ago, some of which shifted my perspectives longterm.

This talk is about the impact on how we will experience and use the web when generative algorithms create most of its content. Appleton explores the potential effects of that and the futures that might result. She puts human agency at the center when it comes to how to choose our path forward in experimenting and using ‘algogens’ on the web, and how to navigate an internet where nobody believes you’re human.

Appleton is a product designer with Ought, on products that use language models to augment and extend human (cognitive) capabilities. Ought makes Elicit, a tool that surfaces (and summarises) potentially useful papers for your research questions. I use Elicit every now and then, and really should use it more often.

An exploration of the problems and possible futures of flooding the web with generative AI content

Maggie Appleton

During our visit to the Neues Museum in Neuremberg last week, this mind map stood out to me. Art collector, dealer and curator René Block (1942) made it as a sort of work autobiography for the period 1964-2014.

It stood out to me because it shows the evolution of his work, the connections between major phases and individual projects.

I have a list of a few ‘big themes’ I’ve been interested in, and have worked on. (in that order, as most often my work came out of a side interest during a previous phase, also when I was employed), and over time I’ve recognised the overall topic that carries them all, a fascination with the affordances of digital technology for our agency and how it impacts how we live, learn, work and organise.

In any given moment I can think that most of my activities are a coincidence, that I happened across them on my generally undirected path, but my blog archive has often shown me that I already mentioned topics and ideas much earlier.
There’s an evolution to them, and since I’ve spotted the ‘carrier theme’ I trust that evolution.

I’m sure I can make a mind map like the one above with the different client projects, activities and key events of the past 26 years. Maybe everyone should make such a map for themselves at times, if only to spot the adjacent paths within one’s reach in the evolutionary plane of possibilities. It seems Block made this at the end of his working life when he was 72. What might it have told him if he had drawn or redrawn it at earlier times?

I have installed AutoGPT and started playing with it. AutoGPT is a locally installed and run piece of software (in a terminal window) that you theoretically can set a result to achieve and then let run to achieve it. It’s experimental so it is good advice to actually follow its steps along and approve individual actions it suggests doing.
It interacts with different generative AI tools (through your own API keys) and can initiate different actions, including online searches as well as spawning new interactions with LLM’s like GPT4 and using the results in its ongoing process. It chains these prompts and interactions together to get to a result (‘prompt chaining’).

I had to tweak some of the script a little bit (it calls python and pip but it needs to call python3 and pip3 on my machine) but then it works.

Initially I have it set up with OpenAI’s API, as the online guide I found were using that. However in the settings file I noticed I can also choose to use other LLM’s like the publicly available models through Huggingface, as well as image generating AIs.

I first attempted to let it write scripts to interact with the hypothes.is API. It ended up in a loop about needing to read the API documentation but not finding it. At that time I did not yet provide my own interventions (such as supplying the link to the API documentation). When I did so later it couldn’t come up with next steps, or not ingesting the full API documentation (only the first few lines) which also led to empty next steps.

Then I tried a simpler thing: give me a list of all email addresses of the people in my company.
It did a google search for my company’s website, and then looked at it. The site is in Dutch which it didn’t notice, and it concluded there wasn’t a page listing our team. I then provided it with the link to the team’s page, and it did parse that correctly ending up with a list of email addresses saved to file, while also neatly summarising what we do and what our expertise is.
While this second experiment was successfully concluded, it did require my own intervention, and the set task was relatively simple (scrape something from this here webpage). This was of limited usefulness, although it did require less time than me doing it myself. It points to the need of having a pretty clear picture of what one wants to achieve and how to achieve it, so you can provide feedback and input at the right steps in the process.

As with other generative AI tools, doing the right prompting is key, and the burden of learning effective prompting lies with the human tool user, the tool itself does not provide any guidance in this.

I appreciate it’s an early effort, but I can’t reproduce the enthusiastic results others claim. My first estimation is that those claims I’ve seen are based on hypothetical things used as prompts and then being enthusiastic about the plausible outcomes. Whereas if you try an actual issue where you know the desired result it easily falls flat. Similar to how ChatGPT can provide plausible texts except when the prompter knows what good quality output looks like for a given prompt.

It is tempting to play with this thing nevertheless, because of its positioning as a personal tool, as potential step to what I dubbed narrow band digital personal assistants earlier. I will continue to explore, first by latching onto the APIs of more open models for generative AI than OpenAI’s.

I have a little over 25 years worth of various notes and writings, and a little over 20 years of blogposts. A corpus that reflects my life, interests, attitude, thoughts, interactions and work over most of my adult life. Wouldn’t it be interesting to run that personal archive as my own chatbot, to specialise a LLM for my own use?

Generally I’ve been interested in using algorithms as personal or group tools for a number of years.

For algorithms to help, like any tool, they need to be ‘smaller’ than us, as I wrote in my networked agency manifesto. We need to be able to control its settings, tinker with it, deploy it and stop it as we see fit.
Me, April 2018, in Algorithms That Work For Me, Not Commoditise Me

Most if not all of our exposure to algorithms online however treats us as a means to manipulate our engagement. I see them as potentially very valuable tools in working with lots of information. But not in their current common incarnations.

Going back to a less algorithmic way of dealing with information isn’t an option, nor something to desire I think. But we do need algorithms that really serve us, perform to our information needs. We need less algorithms that purport to aid us in dealing with the daily river of newsy stuff, but really commodotise us at the back-end.
Me, April 2018, in Algorithms That Work For Me, Not Commoditise Me

Some of the things I’d like my ideal RSS reader to be able to do are along such lines, e.g. to signal new patterns among the people I interact with, or outliers in their writings. Basically to signal social eddies and shifts among my network’s online sharing.

LLMs are highly interesting in that regard too, as in contrast to the engagement optimising social media algorithms, they are focused on large corpora of text and generation thereof, and not on emergent social behaviour around texts. Once trained on a large enough generic corpus, one could potentially tune it with a specific corpus. Specific to a certain niche topic, or to the interests of a single person, small group of people or community of practice. Such as all of my own material. Decades worth of writings, presentations, notes, e-mails etc. The mirror image of me as expressed in all my archived files.

Doing so with a personal corpus, for me has a few prerequisites:

  • It would need to be a separate instance of whatever tech it uses. If possible self-hosted.
  • There should be no feedback to the underlying generic and publicly available model, there should be no bleed-over into other people’s interactions with that model.
  • The separate instance needs an off-switch under my control, where off means none of my inputs are available for use someplace else.

Running your own Stable Diffusion image generator set-up as E currently does complies with this for instance.

Doing so with a LLM text generator would create a way of chatting with my own PKM material, ChatPKM, a way to interact (differently than through search and links, as I do now) with my Avatar (not just my blog though, all my notes). It might adopt my personal style and phrasing in its outputs. When (not if) it hallucinates it would be my own trip so to speak. It would be clear what inputs are in play, w.r.t. the specialisation, so verification and references should be easier to follow up on. It would be a personal prompting tool, to communicate with your own pet stochastic parrot.

Current attempts at chatbots in this style seem to focus on things like customer interaction. Feed it your product manual, have it chat to customers with questions about the product. A fancy version of ‘have you tried switching it off and back on?‘ These services allow you to input one or a handful of docs or sources, and then chat about its contents.
One of those is Chatbase, another is ChatThing by Pixelhop. The last one has the option of continuously adding source material to presumably the same chatbot(s), but more or less on a per file and per URL basis and limited in number of words per month. That’s not like starting out with half a GB in markdown text of notes and writings covering several decades, let alone tens of GBs of e-mail interactions for instance.

Pixelhop is currently working with Dave Winer however to do some of what I mention above: use Dave’s entire blog archives as input. Dave has been blogging since the mid 1990s, so there’s quite a lot of material there.
Checking out ChatThing suggests that they built on OpenAI’s ChatGPT 3.5 through its API. So it wouldn’t qualify per the prerequisites I mentioned. Yet, purposely feeding it a specific online blog archive is less problematic than including my own notes as all the source material involved is public anyway.
The resulting Scripting News bot is a fascinating experiment, the work around which you can follow on GitHub. (As part of that Dave also shared a markdown version of his complete blog archives (33MB), which for fun I loaded into Obsidian to search through. Also for comparison with the generated outputs from the chatbot, such as the question Dave asked the bot when he first wrote about the iPhone on his blog.)

Looking forward to more experiments by Dave and Pixelhop. Meanwhile I’ve joined Pixelhop’s Discord to follow their developments.

With the release of various interesting text generation tools, I’m starting an experiment this and next month.

I will be posting computer generated text, prompted by my own current interests, to a separate blog and Mastodon account. For two months I will explore how such generated texts may create interaction or not with and between people, and how that feels.

There are several things that interest me.

I currently experience generated texts as often bland, as flat planes of text not hinting at any richness of experience of the author lying behind it. The texts are fully self contained, don’t acknowledge a world outside of it, let alone incorporate facets of that world within itself. In a previous posting I dubbed it an absence of ‘proof of work’.

Looking at human agency and social media dynamics, asymmetries often take agency away. It is many orders of magnitude easier to (auto)post disinformation or troll than it is for individuals to guard and defend against. Generated texts seem to introduce new asymmetries: it is much cheaper to generate reams of text and share them, than it is in terms of attention and reading for an individual person to determine if they are actually engaging with someone and intentionally expressed meaning, or are confronted with a type of output where only the prompt that created it held human intention.

If we interact with a generated text by ourselves, does that convey meaning or learning? If annotation is conversation, what does annotating generated texts mean to us? If multiple annotators interact with eachother, does new meaning emerge, does meaning shift?

Can computer generated texts be useful or meaningful objects of sociality?

Right after I came up with this, my Mastodon timeline passed me this post by Jeff Jarvis, which seems to be a good example of things to explore:


I posted this imperfect answer from GPTchat and now folks are arguing with it.

Jeff Jarvis

My computer generated counterpart in this experiment is Artslyz Not (which is me and my name, having stepped through the looking glass). Artslyz Not has a blog, and a Mastodon account. Two computer generated images show us working together and posing together for an avatar.


The generated image of a person and a humanoid robot writing texts


The generated avatar image for the Mastodon account

Newton in 1675 famously said about his work, that if he was seeing further it was by standing on the shoulders of giants.*

Doing so he acknowledged the lineage of the things he worked on, which he added his own combinatory creativity to, gaining us all very considerable new insights.

This weekend reading Chris Aldrich’s essay about the often actively ignored history of the things that make up the current wave of note making methods and tools, Newton’s turn of phrase crossed my mind again. Chris Aldrich showcases the history of something that currently is mostly discussed as building on a single person’s practice in the 1950s to 1990s, as an actually very widely used set of practices going back many centuries.

I think it is a common pattern. Repeating endlessly in bigger and smaller forms, because we’re human.
I also think the often re-used ‘shoulders of giants’ metaphor makes it worse, actively hiding any useful history of most things.

Every output is the result of processes, and building blocks that go beyond the person making the output. And most of those inputs and earlier practices are of a mundane origin. There aren’t that many giants around, that we all can stand on their shoulders for all we come up with.

Everything has a lineage, and all those lineages have something to tell you about the current state of things you’re working on. Along all those lineages knowledge and experience has been lost, not because it wasn’t useful moving forward, but because it wasn’t transferred well enough. Going back in such lineage, to use as feedback in your current practice can be tremendously valuable.
It’s something actively used as a tool of exploration in e.g. ‘the future, backwards’ exercises. In PKM, the example that triggered this posting, it is common to use your old self in that way (talking about how you’re learning from ‘previous me’ and as ‘current me’ write notes for ‘future me’) It’s even what makes Earth’s tree of life special.

It takes looking for that lineage as a first step. Yet if then all we do is scanning history’s horizon for giants that stand out, we may find none, several, or usually one nearer that will have to do as big enough and simply assume that’s where the horizon is, while overlooking everyone else that any giants and we are always building on.

Everything has a deep lineage with a story to tell. Everyone stands on everyones shoulders, of all sizes. It is shoulders all the way down.


If I have seen further it is by knowingly standing on the shoulders of Giants everyone.

Photo ‘Santa Teresa, feria del Vendrell 2019’ by Joan Grífols, license CC BY NC SA

* Most of us may recognise Newton’s 1675 phrase in a letter he wrote. Newton probably knew it was much older. But do we individually generally acknowledge it was at least 500 years old when he wrote it down, or usually attribute it to Newton?