@aartaka I take tbe mozilla "readability" approach in @agregore
It also listens to the user preferences in width and color scheme / font which gets set browser wide.
Readability isn't ideal though and sometimes misses content. In my experience.
Been thinking of making my own based on this web scraper tool I made recently.
@aartaka I'm currently doing everything in JS with the built in DOMParser API. 😅
I think if I come up with a decent algo it'd be easy enough to parse to other formats.
At one point I had code to convert from HTML to markdown for a TUI browser which was silly and fun. 😸
@mauve yes, making a portable algorithm / recipe for page debloating is the priority in that. Implementations come second.
@aartaka @mauve https://github.com/vjousse/unmerdify perhaps? A tool to reverse enshitification, or at least the bloat on webpages, with a French name
@pjacock the tool itself is less interesting than its content extraction rule sets! https://github.com/fivefilters/ftr-site-config are a treasure trove of rules for how to get the essence of the page. I’ve been looking for that for a long time. Thank you!
@mauve If I can drag myself away from designing mechanical keyboards while I remain unemployed with time on my hands, this could be a fun learning exercise project 🤔
@mauve yes, Readability is imperfect due to its focus on plain long form articles. Needs remixing if we’re to do something more generic with it.
There is a niche for a “website cleanup” scraper / simplifier, and I keep stumbling into it. Maybe make a C library doing HTML simplification? 🤔