I have a little over 25 years worth of various notes and writings, and a little over 20 years of blogposts. A corpus that reflects my life, interests, attitude, thoughts, interactions and work over most of my adult life. Wouldn’t it be interesting to run that personal archive as my own chatbot, to specialise a LLM for my own use?

Generally I’ve been interested in using algorithms as personal or group tools for a number of years.

For algorithms to help, like any tool, they need to be ‘smaller’ than us, as I wrote in my networked agency manifesto. We need to be able to control its settings, tinker with it, deploy it and stop it as we see fit.
Me, April 2018, in Algorithms That Work For Me, Not Commoditise Me

Most if not all of our exposure to algorithms online however treats us as a means to manipulate our engagement. I see them as potentially very valuable tools in working with lots of information. But not in their current common incarnations.

Going back to a less algorithmic way of dealing with information isn’t an option, nor something to desire I think. But we do need algorithms that really serve us, perform to our information needs. We need less algorithms that purport to aid us in dealing with the daily river of newsy stuff, but really commodotise us at the back-end.
Me, April 2018, in Algorithms That Work For Me, Not Commoditise Me

Some of the things I’d like my ideal RSS reader to be able to do are along such lines, e.g. to signal new patterns among the people I interact with, or outliers in their writings. Basically to signal social eddies and shifts among my network’s online sharing.

LLMs are highly interesting in that regard too, as in contrast to the engagement optimising social media algorithms, they are focused on large corpora of text and generation thereof, and not on emergent social behaviour around texts. Once trained on a large enough generic corpus, one could potentially tune it with a specific corpus. Specific to a certain niche topic, or to the interests of a single person, small group of people or community of practice. Such as all of my own material. Decades worth of writings, presentations, notes, e-mails etc. The mirror image of me as expressed in all my archived files.

Doing so with a personal corpus, for me has a few prerequisites:

  • It would need to be a separate instance of whatever tech it uses. If possible self-hosted.
  • There should be no feedback to the underlying generic and publicly available model, there should be no bleed-over into other people’s interactions with that model.
  • The separate instance needs an off-switch under my control, where off means none of my inputs are available for use someplace else.

Running your own Stable Diffusion image generator set-up as E currently does complies with this for instance.

Doing so with a LLM text generator would create a way of chatting with my own PKM material, ChatPKM, a way to interact (differently than through search and links, as I do now) with my Avatar (not just my blog though, all my notes). It might adopt my personal style and phrasing in its outputs. When (not if) it hallucinates it would be my own trip so to speak. It would be clear what inputs are in play, w.r.t. the specialisation, so verification and references should be easier to follow up on. It would be a personal prompting tool, to communicate with your own pet stochastic parrot.

Current attempts at chatbots in this style seem to focus on things like customer interaction. Feed it your product manual, have it chat to customers with questions about the product. A fancy version of ‘have you tried switching it off and back on?‘ These services allow you to input one or a handful of docs or sources, and then chat about its contents.
One of those is Chatbase, another is ChatThing by Pixelhop. The last one has the option of continuously adding source material to presumably the same chatbot(s), but more or less on a per file and per URL basis and limited in number of words per month. That’s not like starting out with half a GB in markdown text of notes and writings covering several decades, let alone tens of GBs of e-mail interactions for instance.

Pixelhop is currently working with Dave Winer however to do some of what I mention above: use Dave’s entire blog archives as input. Dave has been blogging since the mid 1990s, so there’s quite a lot of material there.
Checking out ChatThing suggests that they built on OpenAI’s ChatGPT 3.5 through its API. So it wouldn’t qualify per the prerequisites I mentioned. Yet, purposely feeding it a specific online blog archive is less problematic than including my own notes as all the source material involved is public anyway.
The resulting Scripting News bot is a fascinating experiment, the work around which you can follow on GitHub. (As part of that Dave also shared a markdown version of his complete blog archives (33MB), which for fun I loaded into Obsidian to search through. Also for comparison with the generated outputs from the chatbot, such as the question Dave asked the bot when he first wrote about the iPhone on his blog.)

Looking forward to more experiments by Dave and Pixelhop. Meanwhile I’ve joined Pixelhop’s Discord to follow their developments.

In reply to Finding Connectors in Mastodon by Julian Elve

This reminds me of years ago when birdsite was young I did a similar comparison for Twitter. I looked at profiles to see if they seemed in it for the conversation, in it to actually connect. Those would have a balanced ratio of followed/followers. As in contrast with profiles that were ‘large antenna arrays’ (many more followed than followers), ‘A-listers’ (many more followers than followed). Dubbed it conversational symmetry back then in that post. And yes, Valdis Krebs comes to mind too.

Although connectors are defined by their behaviour, in that they join up those who seek knowledge with those who share it, it was suggested that we look at individuals who had a high ratio of follwers to followed as a starting point. …. there’s part of me that’s not convinced that follower ratio is a good measure for who is a ‘Connector’ – perhaps a good Connector would tend to show a more balanced ratio of followers / follows? … in pragmatic terms I am pretty happy with my ad hoc observation that Connectors seem to be “balanced”…

Julian Elve

I read daily, and browse bookstores often. At times you pick up a book in a store, or come across it online, look at it and think that it might be interesting, only to conclude it isn’t and leave it. Until you encounter it a next time, and think it interesting and again conclude it isn’t. Some of those might be interesting at a different point in time when my own interests have shifted to align with it better.

Others I’d better not read because they’re badly written, or there’s strong indications the content doesn’t live up to its backflap pitch. Better to spend my reading time on a different book.

For that group I want to break out of the repeated ‘oh this might be interesting…..oh it’s not’ cycle. I already keep notes about books I haven’t bought yet but might. It’s a sort of preselection stage before both my current reading stack and my anti-library. I now added a Won’t Read List, for books I haven’t bought for which I want to ensure my future self also won’t. You might say it’s type of critical ignoring. I do positive curation in my notes, but now adding negative curation too for those books I repeatedly encountered and rejected.

Through my feeds I follow the book notes and recommendations of other bloggers, and have found some fun and great books through them. For their negative recommendations I never had a use before, but now there’s a way to curate those for myself too.

The inaugural version of my personal ‘Won’t read list’ has two books. Maybe I’ll add it to the OPML book lists I share online.

A useful tip from Nicole van der Hoeven that I adopted in the past days: using the title of a note also as a linked heading inside the note. Especially since the change in Obsidian that de-emphasizes the title of a note in the interface. My note titles are meaningful, Andy Matuschak style, and usually result from writing the note or from writing another note from which it branches off. (My titles also contain a timestamp ensuring they’re unique and providing the ability to place a note in time. Example: “Optimal unfamiliarity 20040107122600”) Having it as primary header in the note itself means I will more likely think about improving the title when I develop the note over time.

Personally, I put the filename as a heading, but I also put it as a link. When I rename the file, the link (which is also the heading), automatically updates.
Nicole van der Hoeven

Screenshot of what that looks like in practice:

Screenshot of a note title optimal unfamaliarity with its title as a header linking to the note itself

I added chatGPT to Obsidian following these steps to install an experimental plugin. I also set up a pay as you go account for OpenAI as I’ve used up my trial period for both Dall-E and GPT-3.

At first glance using the GPT3 plugin works fine, although what seems to be missing is the actual chat part, the back and forth where you build on a response with further prompts. The power of chatGPT as a tool is in that iterative prompting I think.
You can still iterate by prompting the chatGPT with the entirety of the previous exchange but that is slightly cumbersome (it means you’d go into a note delete the previous exchange, and then have the plugin re-add it plus the next generated part).

I think it will be more fruitful to do the entire exchange with chatGPT in the browser and manually grab the content if I want to use it in Obsidian. The MarkDownload browser extension capably grabs the entire exchange with chatGPT and stores it as markdown in my Obsidian notes as well.