A proxy server that acts in the same manner as an AdBlocker where you can define rules to remove blocks of HTML from a page, but it'll also allow you to add blocks to a page and even to replace its CSS with your own, creating a way to “recreate” new websites in a way that can be rendered by old browsers. This is kind of the next evolution of things like WebOne.
The inspiration for this project came after I was literally unable to browse most websites on an iPad 2 with iOS 9, an OS from 2015 with a proper WebKit-based browser for its day, and a new set of SSL certificates struggled to do anything useful with modern websites. It was utterly unacceptable.
We probably want this to be a self hosted solution in order to avoid high bandwidth costs, but we should have a public page where users can contribute their own mods to websites so that people won't have to be modifying every single one on their own.
Have a fixes folder in the repo that contains the files that will be used to “fix” a webpage.
That directory will have subfolders with the domain name of the website to fix. You can also have subdomains there. The app will first try to match a subdomain and will then move to the top level and try that before giving up.
Fixes will have levels, so that you can serve a page to a modernish browser and also serve a similar page to a completely ancient Netscape thing. These levels will execute different scripts. These levels will be represented by a priority number and will be passed by the client. If the number isn't available it'll move to the closest one and if none are available it'll just pass the raw page after fully loading it.
Each fix folder will contain a config file (YAML?) that'll list a bunch of things to do. First it'll have a block(deny?)list that'll function as an AdBlocker for our renders to be easier and faster. This will simply be in the form of a list of regular expressions, if the href matches it'll block before passing to the driver.
Another part of this config will be the scripts definition which will contain a series of regex for hrefs and a script associated with the expression, if it matches the script will be executed on the page. The app will go through all of the items on the list since more than one script may have to be executed.
This approach makes it super simple to implement everything that we want.
Also think of a good and maintainable way to implement the levels in the config file.
Maybe have a list called replace where if a href matches a regex that asset will be replaced by our proxy on the fly.
Do TLS termination of course.
Ensure that the Chrome engine is only spawned when there's a script associated with the request, otherwise handle everything internally without resorting to Selenium. Either this or have some sort of configuration that selects which URLs match that will pass straight through, this way we won't be spawning a full Chrome instance just to fetch a CSS file. Another approach could be of passing everything straight through and only spawn Chrome for matches in scripts or in a special “must use Chrome” match list.
Our stack will be built using several technologies glued together in order to achieve the goal we want in the easiest and fastest way possible that still leaves space for us to improve in the future.
We'll have the SOCKS4 proxy server that will redirect all the requests to an internal Java HttpServer (or HttpsServer if the requested port is 443). That server will do the SSL termination if needed and will get reconstruct the original request URL and gather all of the request headers. If the request is a GET it'll be passed to Selenium in order to fetch whatever is needed in a modern way and BrowserMob Proxy will replace the request headers with the ones from our original request (just don't mess with the User-Agent). If the request is anything else we'll do the query ourselves using Java's HTTP client library. The response of all of this will be sent back to the proxy client.