It happened that one hosted site had a sudden surge in visitor count and though the server was not brought to it’s knees completely page loading was very slow, basically the server was DOSed. It happened before that all of a sudden the load on the server spiked that could be solved with an Apache restart, but never knew why they happened. Maybe this was it…
Apache in Ubuntu is installed with KeepAlive on and KeepAliveTimeout set to 5 seconds by default. KeepAlive is a good thing if your server has resources inĀ abundance, RAM and CPU, if the page uses a database disk performance counts too, so Apache can start many threads.
But let’s step back a bit and explain what KeepAlive is and why is it good for us. When KeepAlive is supported by the server and the browser (I don’t think there is a browser in existence which does not support it, mainly due to HTTP1.1 support), when the browser downloads a page it’ll not start a new HTTP connection, a new socket, to the server tearing down the previous one for each element of the page, like images, CSS and JavaScript files, etc, rather it keeps the connection open and requests the bits on that connection one by one. This saves resources on both sides, but this is where the problem starts. The browser can keep this connection open for a long time, thus tying up a thread on the server side. If, let’s say, Apache is configured to 50 threads and we have 50 concurrent users browsing, the 51st will not get served by the server and time out. I don’t know if browsers still employ this, but they used to use parallel threads to load the page elements, thus half or quarter as many browsers could consume all the threads on the server side. To work around this Apache has a timeout parameter to disconnect browsers if the connection was idle for this much time. The Apache documentation says about these two parameters, and KeepAlive itself, that it can improve page load performance, but on busy servers it can cause a problem. This is what happened to the server, the page was rushed, all threads were busy, server started not to respond to browsers. After I disabled KeepAlive the situation improved, but then RAM and CPU proved to be the bottleneck, but that is a different story. I turned KeepAlive back on, and set the timeout to 1 second. Unfortunately the unit of this parameter is seconds in Apache 2.2, in 2.3 it’ll allow milliseconds. I believe a half a second is a good value, but this is a guess, not a result of any measurement.