I imported hundreds of Amazon e-book purchases into my book notes, using a script I wrote with the assistance of GitHub Co-Pilot.

As a home-cooking coder, coding things often takes me a long time. I know how to cut things up in order to be able to code the pieces. I know a bit of my default coding language PHP, and can read it to see what it does. But actual coding is a different thing. It’s more like a passive fluency rather than an active one. Mostly because I don’t do it often enough to become actively fluent, even though I have been coding my own things since the early 1980s. So coding for me means a lot of looking up how statements work, what terms they expect etc., which is time consuming.
Over time I’ve collected pieces of functionality I reuse in various other projects, and I have a collection of notes on how to do things and why (it’s not really a coding journal, but it could grow into one.) Yet it is usually very time consuming.

Earlier this year I took a subscription on Github Co-pilot. I installed two plugins in Visual Studio Code, the text-editor I use for coding: Co-pilot and co-pilot chat. I thought it might help me make it easier to create more personal tools.
It took until yesterday before I both had the urge and the time to test that assumption.

I am backfilling different types of information into my Obsidian notes. Such as my Google calendar items from 2008-2020, earlier.
Another is ensuring I have a book note for every book I bought for Amazon Kindle. I’ve bought just over 800 books since December 2010 for Kindle (vs 50 physical books since 2008, as I usually use local bookshops for those). For a number of them I have book notes in Obsidian, for others I don’t. I wanted to add notes for all Kindle books I bought over the years.
And this gave me my first personal tool project to try Co-pilot on.

The issue is having a list of Amazon Kindle purchases (title, author, date) and a list of existing book notes, where the title is usually shorter than the one on the Amazon list (no sub title e.g.). I set out to make a script that checks every existing book note against the Amazon list, and writes the remaining Amazon purchases to a new list. Then in a next step that new list is used to create a book note with a filled out template for each entry.

Using Co-pilot and especially the chat function made the coding quick and easy. It was also helping me learn as the chat provides reasons for its suggestions and I could go back and forth with it to understand various elements better. A very useful effect was that from having to write prompts for the chat bot and following up on the answers allowed me to much better clarify to myself what I was trying to do and coming up with ideas how to do it. So it sped up my thinking and creation process, next to providing helpful code suggestions that I only needed to tweak a bit for my use case (rather than find various solutions on stack-overflow that don’t really address my issue). It also helped me make useful notes for my coding journal and code snippet collection.
It was still time consuming, but not because of coding: data cleaning is always a big chore, and will remain so because it needs human inspection.

I now have a folder with some 475 automatically made book notes, in the right structure, derived from the 800 Kindle book purchases over 13 years using my existing book notes as filter.
Next projects to have a go at will be the physical book purchases through Amazon (50), and my old Calibre library of all books I owned before 2012 (over 1000, when we did away with most of them, after I scanned their barcodes all into a database.)

I am pleased with how helpful GitHub Co-Pilot was for me in this. It energises me to think of more little coding projects for personal tools. And who knows, maybe it will increase my coding skills too, or have me branch out in programming languages I don’t know, like python, or help me understand other people’s code like in WordPress plugins I might want to tweak.

In reply to Creating a custom GPT to learn about my blog (and about myself) by Peter Rukavina

It’s not surprising that GPT-4 doesn’t work like a search engine and has a hard time surfacing factual statements from source texts. Like one of the commenters I wonder what that means for the data analysis you also asked for. Perhaps those too are merely plausible, but not actually analysed. Especially the day of the week thing, as that wasn’t in the data, and I wouldn’t expect GPT to determine all weekdays for posts in the process of answering your prompt.

I am interested in doing what you did, but then with 25 years of notes and annotations. And rather with a different model with less ethical issues attached. To have a chat about my interests and links between things. Unlike the fact based questions he’s asked the tool that doesn’t necessarily need it to be correct, just plausible enough to surface associations. Such associations might prompt my own thinking and my own searches working with the same material.

Also makes me think if what Wolfram Alpha is doing these days gets a play in your own use of GPT+, as they are all about interpreting questions and then giving the answer directly. There’s a difference between things that face the general public, and things that are internal or even personal tools, like yours.

Have you asked it things based more on association yet? Like “based on the posts ingested what would be likely new interests for Peter to explore” e.g.? Can you use it to create new associations, help you generate new ideas in line with your writing/interests/activities shown in the posts?

So my early experiments show me that as a data analysis copilot, a custom GPT is a very helpful guide… In terms of the GPT’s ability to “understand” me from my blog, though, I stand unimpressed.

Peter Rukavina

In reply to Stop Using AI-Generated Images by Robin Rendle

This specific argument sounds way too much like the ‘every music download is a missed saletrope of the music industry, 2-3 decades ago. And that was about actual recordings, not generated ones.

There are many good reasons to not use algogens, or to opt for different models for such generation than the most popular public facing tools.

Missed income for illustrators by using them in blog posts as Rendle posits isn’t a likely one.
Like with music downloads there’s a whole world of potential users underneath the Cosean floor. Several billion people, I assume.

My blog or presentations will never use bought illustrations.
I started making lots of digital photos for that reason way back in 2003, and have been using open Creative Commons licenses whenever I used something by another creator since they existed.
These days I may try to generate a few images, if it’s not too work intensive. Which it quickly is (getting a good result is hard and takes more time than a Flickr search for an image) and then paying someone who can manipulate the tools better than I (like now with Adobe Illustrator) might be in order, except for my first point at the start of this paragraph. Let’s also not pretend that all current illustrators do everything by hand or have done since the 1990’s. There’s tons of art work out there that is only possible because of digital tools. Algogens aren’t such a big jump that you get to treat them as a different species.

Not to say that outside the mentioned use case of blogs and other sites (the ones that already now are indistinguishable from generated texts and only have generating ad eyeballs as purpose), the lower end of the existing market won’t get eroded unless the lower end of the market ups their game with these tools (they’re actually better positioned for it than me in terms of skills and capabilities).
I bet that at the same time there will be a growing market for clearly human made artefacts as status symbol too. The Reverse Turing effect in play, as the default assumption will be, must be, that anything is generated. I’ve paid more for prints of artwork, both graphics and photos, made by or at the behest of the artist, than one printed after their death for instance. Such images adorn the walls at home rather than my blog though.

I always thought those AI-generated images in blog posts and elsewhere around the web were tacky and lame, but I hadn’t seriously considered the future economic impact on illustrators.

Robin Rendle


Image generated with Stable Diffusion using the prompt “A woman is sitting behind a computer making an image of a woman drawing, sketchbook style”. The result is an uncanny image. Public domain image.

Bookmarked China Seeks Stricter Oversight of Generative AI with Proposed Data and Model Regulations (by Chris McKay at Maginative)

Need to read this more closely. A few things stand out at first glance:

  • This is an addition to the geo-political stances that EU, US, China put forth w.r.t. everything digital and data. A comparison with the EU AI Regulation that is under negotiation is of interest.
  • It seems focused on generative AI solely. Are there other (planned) acts covering other AI applications and development. Why is generative AI singled out here, because it has a more direct population facing character?
  • It seems to mostly front-load the responsibilities towards the companies producing generative AI applications, i.e. towards the models used and pre-release. In comparison the EU regulations incorporates responsibilities for distributors, buyers, users and even users of output only and spans the full lifetime of any application.
  • It lists specific risks in several categories. How specific are those worded, might there be an impact on how future-proof the regulation is? Are there thresholds introduced for such risks?

Let’s see if I can put some AI to work to translate the original Chinese proposed text (PDF).

Via Stephen Downes, who is also my source for the link to the original proposal in PDF.

By emphasizing corpus safety, model security, and rigorous assessment, the regulation intends to ensure that the rise of [generative] AI in China is both innovative and secure — all while upholding its socialist principles.

Chris McKay at Maginative

Favorited I’m banned for life from advertising on Meta. Because I teach Python. by Reuven Lerner

The Python programming language is over 30 yrs old, the Pandas data analysis library for it is 15 yrs old. It’s not unlikely Meta’s advert checking AI was created using both somewhere in the process. But the programming context of both words was definitely not in the training set for it.

Provide Python and Pandas training, advertise on FB. Get blocked because Meta’s AI spits out a high probability it is about illegal animal trade. Appeal the decision. Have the same AI, not a person, look at it again and conclude the same thing. Get blocked for all time. Have insiders check and conclude this can’t be reversed.

Computer says no…‘ and Kafka had a child and it’s Meta’s AI. And Meta has no human operated steering wheel that is connected to anything meaningful.

Via Ben Werdmuller

I’m a full-time instructor in Python and Pandas, teaching in-person courses at companies around the world … Meta’s AI system noticed that I was talking about Python and Pandas, assumed that I was talking about the animals […], and banned me. The appeal that I asked for wasn’t reviewed by a human, but was reviewed by another bot, which (not surprisingly) made a similar assessment.

Reuven Lerner

Bookmarked een tweet van Frankwatching

Niets is zo persoonlijk als een machine je hartekreet laten schrijven! Technische mediatie brengt je alleen maar dichter bij elkaar. Ik hoop dat het team van Frankwatching de ironie ziet van hun eigen tekst.

Hoe zet je AI in … om persoonlijker te communiceren?

Frankwatching

(overigens valt me ook op dat het opslaan van losse tweets in The Web Archive niet lukt. Eerder lukte me dat wel. Dan maar een screenshot, met de gebruikelijke caveat)