freelabs federation

 free computing, free culture, free hardware


[lit]

Outfoxing the Web

[lit] [replacing]Thats/That's,thats/that's,microsoft/Microsoft,grace/Grace,hoppers/hopper's,hopper/Hopper,general public license/General Public License,united states constitution/United States Constitution,united states/United States,free software definition/Free Software Definition,java/Java, turing/ Turing,github/Github,roman/Roman,oprah/Oprah,openoffice,OpenOffice,apache software foundation/Apache Software Foundation,isnt/isn't,halloween/Halloween,apple/Apple,ibm/IBM,free software/Free software,red hat/Red Hat,gnome/GNOME,debian/Debian,Whats/What's,nokia/Nokia,linux foundation/Linux Foundation,"i /"I ,theyll/they'll,theyre/they're,windows/Windows,lxde/LXDE, os / OS ,gnu operating system/GNU Operating System,linux/Linux,gnu/GNU,drm/DRM,Drm/DRM,pale moon/Pale Moon,icecat/IceCat,cecat/ceCat,mozilla/Mozilla,firefox/Firefox,ozillas/ozilla's,dns/DNS,-https/-HTTPS,(doh)/(DOH),libreoffice/LibreOffice,oracle/Oracle,sun mic/Sun Mic,october/October,canonical/Canonical,open source initiative/Open Source Initiative,eric raymond/Eric Raymond,raymond/Raymond, osi/ OSI,bruce/Bruce,perens/Perens, ian/ Ian,murdock/Murdock,unix/Unix,dont/don't,gpl/GPL,stallman/Stallman,richard/Richard,linus/Linus,torvalds/Torvalds,christmas/Christmas,volkswagon/Volkswagon,alan/Alan, tom / Tom ,asnt/asn't,githug/Githug, im / I'm ,youre/you're,bourne/Bourne, ada / Ada ,lovelace/Lovelace, mit/ MIT,daniel/Daniel,quinn/Quinn,saddam/Saddam,hussein/Hussein,Weve/We've,january/January,x.org/X.org, cd/ CD,copyright act/Copyright Act,moon/Moon,Ive/I've,python/Python,wouldnt/wouldn't,didnt/didn't,couldnt/couldn't,bash/Bash[replacing][fixg] continuing the theme of [[software-disobedience]], the "web" is really two things: a protocol, and a hopeless set of browser standards. "hopeless" as we will never practically be able to create new web clients (browsers) that comply with these completely outlandish, bloated demands. if not for this, it would be hard to argue against the idea of browsers being standardised. the benefits of standardisation are well known. but a standard that thwarts free software cant easily be argued to help free software, and to top it off, the web standard now includes drm. it's time for protest. the best protest against an oppressive standard is to use something else. nobody really thinks thats an option at this time, nor is it likely to be anytime soon; the web is here, that cant be helped. it's likely a given that we are going to use web browsers. what we can do to protest, is use alternatives whenever possible-- or at least whenever practical. if you dont want every feature in the world to be wrapped up and held hostage in a single application like a browser, it is absolutely necessary to recognise features that could be implemented and used in other, smaller applications. we dont know this for sure, but it's likely that most javascript applications for example, are going to require a browser. it's also possible that we could drag a javascript implementation into some other application, making it possible to implement some javascript features in other contexts. technically, node.js already does this-- it implements javascript features in a server-side context. for anybody that thinks this overall approach is stark lunacy, there is a precedent: gopher. gopher has its own protocol and its own standard, which is minimal. as to the overall approach, ive written a gopher client before. it isnt difficult, in fact it's a fun project. ive also modified an existing gopher client, to add a feature. that isnt difficult either, in fact it demonstrates as a fine example, what sort of fun and interesting alternatives to the web we can have. gopher-over-http proxies exist, and http-over-gopher proxies are possible, but gopher is just one example of what we can do to gradually diminish the de facto monopoly of the web. the big question put being asked here is that although we have treated this virtual monopoly as a benefit, are we really certain we want it as-is? and if we dont, what little can we do about it? gopher is a valid example, and ive supported it in the past. i hope this article offering it as one example of a solution will be taken as being supportive as well, though i dont think we need to stop with a single example. it's no secret in the gopher community that gopher isn't going to "rise again" and take over where the web left off; some might even consider that a feature. but if we are looking for more than one alternative to the web, it's nice to know one already exists. and i think the narrative im putting forth is one that could help promote gopher, as well as promote free software. based on my (limited) experience making a gopher client, i can outline and begin to implement a prototype example of a web browser alternative. i offer the design ideas here in hopes of inspiring someone to think about this and either adapt my work, or develop their own ideas-- because like gopher, whats being offered here is simply another example. the idea is not to take everything in a singular new direction, but to make it possible to have smaller clients, simpler standards (if any) and more modest goals for online experience-- as with gopher. for my own design, rather than start with the sheer simplicity of gopher and bolting onto that, i want to start with the web and work from the other direction-- deciding what i want to implement first. on the practical side of things, ive seen some very sophisticated tools for grabbing text from the web. in programming language parsing and in processing html, it is possible to do very powerful and sophisticated things with abstract syntax trees and the document object model. countless tools exist for working with these-- my approach has always been deliberately less sophisticated. i want some things to be simpler than all that. but i also want to have tools that are easier to create, easier to automate, easier to customise without a terribly complicated model. it really comes down to parsing text. neither gopher nor the original web client offered inline images. in my opinion, we want things like inline images to be optional. since you can turn that feature off in web browsers (no thanks at all to alex limi, who decided it needed to be more difficult because /shameless corporate propaganda/) it is technically optional anyway. as a web client, xombrero made it possible to decide per-website when javascript would load, and possibly when other features like cookies or images would load. technically (but tediously) we can do that with firefox, and we want that to be possible for us as well. so whether an *img* tag produces an inline image (where such things are even supported, because im more likely to work on a text-based client) or a link would depend on how we parse it, and on the design and configuration of the client. ive used lots of text-based browsers, including lynx and links and elinks2. i particularly love that elinks can do cgi without requiring the user to run a server. the most complex stuff ive written (and i dont consider it all that complex) i wrote by starting out simple, and gradually adding to each part of the program. i would spend time thinking about design, and design is important, but i would try to have it do something useful each step of the way. the first thing i want is something that parses text from the web, and does something useful with it. i know for my design, i want it to be more interactive than just the command line-- for an interactive online client, curses is an easy choice. i would consider urwid if it were more common; urwid seems like it could be ideal for this task. but curses is more familiar and more readily available. a lot of what im working on has nothing to do with the user interface anyway. the first task is to make curses load and print text. i want to do all this in python of course, so that its easier for people to modify and so it doesnt have to be recompiled. i would like to have "tabs" like elinks does, but they aren't a priority. i want a text-parsing client first, and tabs in a text-based client arent too tricky to add if you have the other stuff working. the goal is to separate features out into "plugins" in the form of python modules. we could offer plugins for input-- such as for gopher and http and https, or for loading local files and directories. such plugins could determine whether gopher is handled via curl-- which unlike wget supports gopher but (unlike wget) is based on github, and thus controlled by microsoft. we definitely want curl to be an optional component, not a requirement. fig, as an analogue, is one of my programming languages-- and pygame is based on github, and the only component of fig that makes github unavoidable. fig works without pygame, so the only github-based component is completely optional. cpython is also based on github, sadly-- but without pygame, fig works with pypy instead (not github-based, and continues to support python 2.) we want this client to similarly run without any github-based components. but input is fairly modest to implement, and it's trivial to have our client switch to wget when curl is unavailable or to urllib if wget is unavailable, so input plugins are a low priority. how we parse text quickly builds into something we want and need to customise, so output plugins are a higher priority if we want the user to be in as much control as possible. (it also makes the task of managing output far less cumbersome for us, not just other hypthetical "users"). my initial goal, partly implemented, is to have a text-based, gopher-style client for parsing html. it doesnt have to parse only html, though that is an initial goal. within an html and javascript implementation, i do a lot of work in a syntax i devised for my own convenience called freewiki, and that is the system this page is displayed with. freewiki is less cumbersome for me than html, and demands less of me in terms of typing and editing. so it would be reasonable for me to implement at least a partial freewiki processor as an optional plugin for the html processor. for the gopher-style client, the design is modest-- strip html tags, but hold onto anchor tags if they contain href, and process those tags (which may include relative links such as "[lit]/wiki/gopher[lit]" for wikipedia or "[lit]../CGI-BIN/index.html[lit]") with the base href to produce full urls-- put each of these urls at the beginning of their own line, but allow trailing text, like this: click http://example.com/index.html here to go to the home page. i call this "mode" of parsing "gopherparse" and intend for it to be a plugin, as well as the default plugin for the client when another is not specified. it strips html and converts relative urls into full urls. it puts each link on its own line, just as gopher does with selectors (what links are called in the gopher world). and you navigate by moving up and down (with arrow keys in this client) to go from the previous selector to the next, or vice versa, before hitting enter. initially, the client is implemented with the gopherparse feature. it already recognises freewiki content, if the content is strictly demarcated (it looks for the span tag that has the id "wiki", this could easily lead to false positives or missed detection) and if it is in freewiki mode, it treats newlines as br tags (a feature of freewiki) and strips freewiki url tags-- we want it to replace them with newlines, so the client can process those the same way it processes anchor tags. since this could interfere with other content, we definitely want the freewiki parser to be optional-- and it depends on (assumes the use of, doesnt work without) the gopherparse feature. so while gopherparse is the default, and freewiki is likely to be enabled by default unless it causes too many problems, we want a way for the user to control all of this. weve already described how the client works by default-- it treats the web like gopherspace, and creates a gopher-like client for webpages. however, one of the key ideas is to have it go beyond that capability and be extensible. some emacs fans are probably thinking "why not juse use emacs?" thats an easy one-- i respect emacs, i recognise its superiority and importance-- but ive tried it and i dont want to use it. what about vim, then? ive tried vim too, but i only like it slightly more than emacs (im not saying its better, i like it slightly more despite knowing it is technically inferior; heck, i even like nano a lot, what does that tell you?) and vim is sadly based on github. im trying to, where possible, help people get away from github. once the gopher-style client is complete, the next step is to create a configuration and plugin setup that makes the defaults optional. this means that primary plugins like gopherparse can be chained together like a pipeline or disabled, and secondary plugins (the ones that depend on primary plugins) like freewiki can turned on or off depending on what input is loaded. this is not yet implemented, but having designed fig i think it is very possible to make a simple plugin framework in the style of a minimalist, pipeline-based scripting language. we want it to be friendly enough for non-programmers to use it, and for programmers or command line users to love it. as ive worked on the gopherparse functionality and a few other features, ive thought about how to make this work. so consider this the design outline: first, we look at existing functionality: [lit]*[lit] gopherparse (freewiki) [lit]*[lit] text filter (.outfoxfilter configuration) [lit]*[lit] file output (to outfox.txt) [lit]*[lit] option to return a space for stripped tags, instead of nothing (sometimes this is useful). all of these features are presently implemented in the fastest, most trivial way possible, so i can already make use of them. none of them are implemented in the most elegant or organised way yet. the design for the client is one that can probably be implemented in a month or less. so far ive put a couple days (at least a few hours per day) into the client, and im basing my estimate on my experience with both quasi (a fig-like language implemented in javascript) and making a gopher client. since im not putting the amount of work into this that i put into quasi (an almost daily effort over the course of a month or so) i wont make any promises about actually doing this. but i think its possible to design this in a way that it is easy to implement. i really like simple, easy-to-implement designs. if you can make it even simpler, i applaud you for your minimalist ingenuity. currently .outfoxrc is loaded, but it isnt parsed in a way that makes this next stage of development possible. with this design we are already straddling the line between a configuration file and a very simple programming language, and thats a lot of fun. basically we want the option of creating a simple command pipeline for simple regexes (not even regexes, but basic string matching) on urls we tell the client to load. so first we design our pipeline based on the features we already have, in the hopes that it will be both simple and sophisticated enough (strike the right balance) for being an extensible client. we want this to be easy to create, but we want it to be able to do some neat things if possible. we are basing this all on the idea of parsing text (whether its html, or a simple language like fig) and its very possible that we are recreating mime here in a very strange way. if so, thats incidental. mime is very useful, but im not in love with it. the default behavior for the client presently can be described like this: gopherparse freewiki we could use pipes for this pipeline, but im inclined to use semicolons. the option to return a space for stripped tags depends on gopherparse, and is called tag2spc: gopherparse freewiki tag2spc file output does not depend on another plugin, so when we want to do that we could put it as a separate command in the pipeline: gopherparse freewiki tag2spc ; filesave filesave would always outputs to outfox.txt, but we could make it accept an optional filename. thats not a priority for me, making it output to outfox.txt means that outfox.txt is the only file we need to worry about being clobbered during browsing. it is mostly a thing i added for using outfox in a script, so it can be copied after use. this could lead to a race condition, so having a stdout-only version of filesave called "dumptext" would be useful. gopherparse freewiki tag2spc ; dumptext already we have new, unintended options based on this arrangement-- instead of dumping text after running gopherparse, the user can run dumptext to get the unmodified file: dumptext we could also have a cache feature, which not only saves the output to a file (lets put the file in a folder called "outfoxcache" to leave the rest of the filesystem alone) but makes it possible to load the cached file offline: cache ; gopherparse freewiki ; filter ; cache now look at what weve done here... two caches in two places. if we want we can just do one or the other. but lets say this is what the user wants. if we go to [lit]http://techrights.org/feed[lit] (which is xml by the way-- we dont have an xml parser so this is going to be treated like we treat html as the default) this is how the cache plugin might work: * first, it downloads [lit]http://techrights.org/feed[lit] and caches that file here: [lit]./outfoxcache/http/techrights.org/feed/cache[lit] * then, it runs gopherparse with the freewiki secondary plugin and caches that, to [lit]./outfoxcache/http/techrights.org/feed/cache_gopherparse-freewiki_filter[lit] if in the future, the same url is loaded and returns nothing, it can offer to load the cache from [lit]./outfoxcache/http/techrights.org/feed/cache[lit] and if there isnt one, it can still offer again for any request that uses cache ; gopherparse freewiki ; filter -- it might not have a cache from earlier in the pipeline, but if it has a cache from later, it can offer that. whether this cache is used to load pages offline or not, the important thing about the cache feature is saving a copy when the user wants one. "filter" is a feature that removes lines that contain string matches from .outfoxfilter. if you run filter after cache, it will cache the unfiltered version and display the filtered version. if you run cache after filter, it will cache the filtered version and then display the filtered version. another plugin i would like is grep. though it would be better to make it use regexes perhaps, normally when i implement something called "grep" i just mean it as basic string matching. ive used real regexes, i promise: gopherparse freewiki ; filter ; grep ; cache this would gopherparse html and freewiki content (if found) then filter against .outfoxfilter and let the user input a line of text to search for. after displaying only the lines that matched, it would cache the output. so this is how pipelines would work for describing the default text parsing behavior of the client. and since input plugins (which decide how to get the content) are a low priority, we are modeling this based on output plugins-- how to treat the content once we have retrived it. but i think we could have two configurations: tag2spc 0 default input: wget, urllib # (try wget first, use urllib if wget is not found-- dont try curl) output: gopherparse freewiki ; filter for the client we would have a hardcoded default for when .outfoxrc is not found; perhaps we could call it outputcfg in a variable: outputcfg = "gopherparse freewiki ; filter" the "default" setting in .outfoxrc would naturally override the hardcoded default in the client: default output: gopherparse freewiki ; filter ; cache but suppose we dont want to cache anything from techrights: default output: gopherparse freewiki ; filter ; cache ://techrights.org/ output: gopherparse freewiki ; filter or we want to cache only for things from techrights and the stallman account (which im making up hypothetically) on notabug.org: default output: gopherparse freewiki ; filter ://techrights.org/ output: gopherparse freewiki ; filter ; cache https://notabug.org/stallman/ output: gopherparse freewiki ; filter ; cache but suppose for some bizarre reason, we want to force the curl plugin to use gopher mode to try to load stallman's notabug account, even though that will fail: default output: gopherparse freewiki ; filter ://techrights.org/ output: gopherparse freewiki ; filter ; cache https://notabug.org/stallman/ input: curl gopher output: gopherparse freewiki ; filter ; cache this design has some limitations, but goes far beyond the typical gopher client in terms of possibilities. it can be used to handle gopherspace, as well as treat web pages more like a gopher client-- or who knows what kind of features it could have? pluginname = "gopherparse" # only accept a string of consecutive alphanumeric characters, starting with a letter if sanitisepluginname(pluginname): exec("import " + pluginname) # gopherparse at runtime would try to import the freewiki plugin, and ignore it if it didnt load some features would only work if hooked in a way that required the client to support the plugin-- some features could be added without changing the client, others would require minimal alterations which would have to be supported by the client maintainers. of course it would be easier to fork the client to support new features as well, as this is designed to be a project that a small number of people or an individual can maintain. any infrastructure that made it trivial to implement any imaginable feature without ever requiring modification to the client would create far more complexity than i can imagine implementing. so im content to make it support plugins that change content without client modification, and that add functionality to the client with minimal alterations that the user can control by either not configuring the plugin, or by deleting the plugin (at which point it will not load, and the client will pretend it doesnt exist.) imagine if your web browser gave you this much control over your browsing, via a simple config file and being mostly a scripted application that didnt require compilation to make major alterations to. of course i also support the idea of a fully scriptable web browser. but thats a lot more to ask for, especially when you want it to run in python, handle different things differently depending on domain name or even the path of a url. these are probably good ideas for web browsing, but as already said-- standards-compliant web browsers arent necessarily something we want to fully support, if we want it to be possible for people to create free software alternatives, avoid microsoft-controlled github, or boycott compliance with drm on the web.[fixg] figosdev, march 2020 home: [lit]https://freelabs.neocities.org[lit]