I’m looking for a PHP script with no (or little/packaged) dependencies to convert HTML to markdown. This to be able to from my feedreader directly save clipped webarticles into my local notes folder. MD to HTML I have, HTML to MD I’m looking for.
@frankmeeuwsen mmm, misschien, als ik het vanuit php kan aanroepen. In mijn microsub reader wil ik naast via micropub naar mijn site reageren, ook kunnen kiezen voor opslaan als md in een standaardfolder (newclippings in mijn Obs vault) Nu doe ik dat met openen in de browser en dan met mijn markdownclipper opslaan. Liever doe ik het direct vanuit microsub
@frankmeeuwsen daar had ik nog niet aan gedacht. Dat is wat ik nu al in mijn browser gebruik nl. Wellicht dat ik de Turndown js component daarvan kan hergebruiken lokaal.
https://github.com/thephpleague/html-to-markdown is probably the way to go here. (Yes, it uses Composer, but that’s become pretty much the standard now, and for good reason [and the PHP League packages are high quality].)
Dank je wel Jan, die gaat op mijn lijstje om te verkennen.
Het is inderdaad gelukt met dit script (het bleek dat ik composer al had).
I extended the capabilities of my microsub feed reader with the option to save web articles directly from the reader to my Obsidian notes in markdown format.
Until now if I wanted to save an entire article I found in my feed reader, I would open it in the browser and then use the markdownclipper browser add-on to add some context and then save the article in markdown in my notes. I wanted to cut out that step of opening it in the feed reader, by saving it directly to my markdown notes. In my feedreader I already have a response form to e.g. post a reply to a posting on my own site. Posting it to my notes means adding a path to how I process that form.
I had to find a suitable script for converting HTML to MarkDown first. Which I found in PHP League’s HTML-to-Markdown, as suggested by Jan Boddez. It requires Composer which I already had installed on my laptop.
I tweaked my feed reader’s response form to also (as a hidden field) include the original HTML of a posting (using htmlentities to stuff it into a form field value). The script that processes the form I altered to both have a path for posting to websites (using micropub) and a new path to make a note in Obsidian, which is then saved as a .md file to the folder I store all clipped articles in.
To make a note I shape the available input the same way I template clipping things from the browser. At the top is my rationale for clipping something and reference to the source, followed by the original posting after which I add some keywords as tags and again the reference to the source.
In the images below you see the corresponding elements marked both as they appear in the reader as well as the resulting note.
The article as shown in my feed reader:
1: the original HTML content from a feed
2: title of the article (prefilled by my feed reader)
3: name of the author (prefilled by my feed reader)
4: original article’s URL (prefilled by my feed reader)
5: the reason and context why I am saving this to notes (also used to write a reply to a post, or the reason for bookmarking something if it will be posted on my site)
6: a quote I want to highlight
7: keywords that will become tags or categories on my site, and tags in my notes
8: selector for which site to post to (zyl is my blog), or ‘obs’ for making a note in Obsidian
Except for that last one those numbers are marked on the image of the resulting markdown note. The resulting note in Obsidian:
1: the original HTML content from a feed shown in Markdown as the main body of the note
2: title of the article, both shown as part of the content of the note, as well as the title of the note (where a timestamp is added)
3: name of the author (mentioned with the source both at the top and bottom)
4: original article’s URL (mentioned with the author both at the top and bottom)
5: the reason and context why I am saving this, always at the top as it helps me process the content better
6: a quote I wanted to highlight
7: keywords that have become hashtags
(This posting was also written in my notes and, except for the images, posted directly from Obsidian to my site. Meaning I can both automatically move material into Obsidian, as well as automatically move material out of Obsidian. I quite enjoy the feeling of using that ‘magic’.)
@ton Pandoc is dus geen optie?
@frankmeeuwsen mmm, misschien, als ik het vanuit php kan aanroepen. In mijn microsub reader wil ik naast via micropub naar mijn site reageren, ook kunnen kiezen voor opslaan als md in een standaardfolder (newclippings in mijn Obs vault) Nu doe ik dat met openen in de browser en dan met mijn markdownclipper opslaan. Liever doe ik het direct vanuit microsub
@ton Er is een PHP wrapper. Ik heb hem aan de praat gekregen om oude posts van mij in Textile naar HTML te krijgen. https://github.com/ueberdosis/pandoc
GitHub – ueberdosis/pandoc: A PHP wrapper for Pandoc to convert any text format in any other text format
@ton En wellicht dat deze extensie om webpagina als md op te slaan al een goede start voor je kan zijn: https://github.com/deathau/markdownload
GitHub – deathau/markdownload: A Firefox and Google Chrome extension to clip websites and download them into a readable markdown file.
@frankmeeuwsen dank, dat is een poging waard.
@frankmeeuwsen daar had ik nog niet aan gedacht. Dat is wat ik nu al in mijn browser gebruik nl. Wellicht dat ik de Turndown js component daarvan kan hergebruiken lokaal.
@ton Op https://github.com/frankmeeuwsen/dtd-custom-plugin/blob/9d75472c616a0acd5a104ef908fbae41ae4bf70b/dtd-custom-plugin.php#L610 zie je hoe het eenvoudig in PHP is aan te roepen met die wrapper. Ik moet alleen Pandoc zelf op de server zien te krijgen…
dtd-custom-plugin/dtd-custom-plugin.php at 9d75472c616a0acd5a104ef908fbae41ae4bf70b · frankmeeuwsen/dtd-custom-plugin
@frankmeeuwsen ga ik naar kijken. Dank!
https://github.com/thephpleague/html-to-markdown is probably the way to go here. (Yes, it uses Composer, but that’s become pretty much the standard now, and for good reason [and the PHP League packages are high quality].)
Dank je wel Jan, die gaat op mijn lijstje om te verkennen.
Het is inderdaad gelukt met dit script (het bleek dat ik composer al had).
I extended the capabilities of my microsub feed reader with the option to save web articles directly from the reader to my Obsidian notes in markdown format.
Until now if I wanted to save an entire article I found in my feed reader, I would open it in the browser and then use the markdownclipper browser add-on to add some context and then save the article in markdown in my notes. I wanted to cut out that step of opening it in the feed reader, by saving it directly to my markdown notes. In my feedreader I already have a response form to e.g. post a reply to a posting on my own site. Posting it to my notes means adding a path to how I process that form.
I had to find a suitable script for converting HTML to MarkDown first. Which I found in PHP League’s HTML-to-Markdown, as suggested by Jan Boddez. It requires Composer which I already had installed on my laptop.
I tweaked my feed reader’s response form to also (as a hidden field) include the original HTML of a posting (using
htmlentities
to stuff it into a form field value). The script that processes the form I altered to both have a path for posting to websites (using micropub) and a new path to make a note in Obsidian, which is then saved as a .md file to the folder I store all clipped articles in.To make a note I shape the available input the same way I template clipping things from the browser. At the top is my rationale for clipping something and reference to the source, followed by the original posting after which I add some keywords as tags and again the reference to the source.
In the images below you see the corresponding elements marked both as they appear in the reader as well as the resulting note.
The article as shown in my feed reader:
1: the original HTML content from a feed
2: title of the article (prefilled by my feed reader)
3: name of the author (prefilled by my feed reader)
4: original article’s URL (prefilled by my feed reader)
5: the reason and context why I am saving this to notes (also used to write a reply to a post, or the reason for bookmarking something if it will be posted on my site)
6: a quote I want to highlight
7: keywords that will become tags or categories on my site, and tags in my notes
8: selector for which site to post to (zyl is my blog), or ‘obs’ for making a note in Obsidian
Except for that last one those numbers are marked on the image of the resulting markdown note.
The resulting note in Obsidian:
1: the original HTML content from a feed shown in Markdown as the main body of the note
2: title of the article, both shown as part of the content of the note, as well as the title of the note (where a timestamp is added)
3: name of the author (mentioned with the source both at the top and bottom)
4: original article’s URL (mentioned with the author both at the top and bottom)
5: the reason and context why I am saving this, always at the top as it helps me process the content better
6: a quote I wanted to highlight
7: keywords that have become hashtags
(This posting was also written in my notes and, except for the images, posted directly from Obsidian to my site. Meaning I can both automatically move material into Obsidian, as well as automatically move material out of Obsidian. I quite enjoy the feeling of using that ‘magic’.)