Dynamic websites are easy, and you can make one too

Date: 2025-02-15 17:15
Tags: this-web-server, just-tell-the-computer-what-you-want-it-to-do

In the last few years there's been a movement back towards static HTML sites, as a reaction to the extremely bloated websites with 2MB of minified Javascript on every page, and that's cool. There are advantages of static sites - they can be hosted anywhere, even file://. And they guarantee that every request can be served quickly. There's been a proliferation of static site generators such as Hugo and others.

But I've also seen them viewed as the fundamental layer, a lower layer on top of which everything else sits, like the Internet equivalent of assembly code. This is not actually the case. The web is built on a request-response paradigm, and the request goes to your server, and your server can generate the response any way you want it to. Indeed, a basic strategy is to just look for a file with the name mentioned in the request. It's not the only strategy. The server is already running code to service the request, and if your web server is anything more than incredibly basic, it gives you the opportunity to insert your own code into the pipeline. Sometimes it's via connecting to a separate daemon process, sometimes it's via launching a program from disk when the program file is requested, sometimes it's via dynamically loading a library into the web server process, and sometimes it's via a scripting language embedded in the web server process. Even python3 -m http.server has a limited ability to run CGI scripts. Extension mechanisms aren't the same in all web servers, so look up what's supported by yours.

There are some good reasons to treat a site as a collection of static files, of course — if you want to host the same site as an IPFS directory or a downloadable zip file. But the majority of websites do not have such requirements. And you can still produce a separate static variant of your website if you want to. You might even find it most convenient to do this using the same process that generates pages dynamically.

What am I running? I'm using nginx as a front end (or "reverse proxy") to terminate SSL and route requests to different backends based on the host and path. The same server hosts several applications, so a shared front end is needed, and nginx does this well. For example, if the Host header matches social.immibis.com, the request is passed to Pleroma (a Fediverse server) in HTTP format via a Unix socket. For the main website, it's passed to a program I called appsvr (formerly blog_scgi) in SCGI format via a Unix socket.

When you're using a reverse proxy, HTTP isn't the best proxy protocol. Any reverse proxy has to transfer metadata from the layers below HTTP up to the HTTP layer, since that's the first layer that makes it past the proxy. This makes it possible to confuse metadata transferred by the proxy with metadata originating from the client. For example, it's common for a reverse proxy to insert an X-Forwarded-For header containing the client's IP address — since the back end would otherwise only see the IP address of the reverse proxy. But what happens if the client already sent an X-Forwarded-For header? The proxy would have to remove it, but this has to be configured and tested. And what if the reverse proxy doesn't know it's supposed to insert an X-Forwarded-For header, but the back end checks for one? then the client can just send one and the back end will see a fake client IP address.

Using a different protocol prevents these possibilities. In CGI, the web server launches a new process with a set of environment variables. Headers received from the client are passed in environment variables starting with HTTP_, such as HTTP_USER_AGENT. Other pieces of information, such as the client IP address, are passed in other environment variables, such as CLIENT_ADDR, so there is no possibility of confusion. Additionally, when each piece of data is stored in a separate environment variable, there's no need to re-parse the HTTP request. Any time a complicated data structure is parsed by more than one different parser, there's a risk that each one could parse it differently.

SCGI and FastCGI pass the same environment variables that are used in CGI, but instead of passing them as environment variables, the same set of data is passed in requests through sockets. Both protocols use simple, unambiguous encodings unlike HTTP itself. FastCGI includes a lot of extra complexity to allow multiplexing multiple concurrent requests through a single socket, which nginx doesn't support anyway, so I chose the simpler SCGI protocol. SCGI doesn't support streaming request bodies, but I don't need that feature either.

SCGI is a very simple protocol that you could implement in a single afternoon.

With request and response buffering, the overall flow of an SCGI request is a lot like a function call. nginx reads the whole HTTP request, converts it to SCGI format, connects to the back end process, sends the request, waits for the response, receives the whole response, closes the SCGI connection and sends the response back to the client at the client's speed. It's almost like implementing a function (pure or not): byte[] getResponse(Map metadata, byte[] body)

Buffering the entire response works fine for web pages, since they are not very big and it is very convenient. For larger files such as images, the application server returns an internal redirect (X-Accel-Redirect), which tells nginx to go and do another path lookup and fetch something else, possibly from a different back end. Locations marked as internal; can only match internal redirects, so you can you can prevent bypassing the application server. And since it's a different backend (such as a static directory, instead of SCGI) it isn't subject to the constraints of SCGI such as having to fully buffer the response. Pedantically, SCGI does support streaming, if configured, but why spend effort on that just to serve a static file?

You could write your application server in C like mine, but if you want to actually be productive, I'd recommend using your favourite language instead.