Regular Kamusi visitors have probably noticed that the site has suffered from chronic problems with page loads. Last week the problem got so bad that we had to restart our server several times. We scoured our usage logs and analyzed our system, and we think we've solved the problem.
The key to solving the problem lay in our current hardware setup. We are using two servers at the moment, one for our regular pages and one for our data. Pages with URLs starting with "kamusiproject.org" come off the new server, and data pages with URLs beginning with "perl.kamusiproject.org" are physically served from a different machine. This is a temporary situation while we migrate the data to the new PALDO system that will run on the new server, but fortunately it gave us the ability to understand which pages were giving us the load problems.
Most hits to the Kamusi Project, more than 1,000,000 a month, are dictionary lookups. Our other pages, such as the discussion forum and this blog, receive only a fraction of the traffic. However, we realized that the data server that hosts dictionary lookups was working just fine. Our problem therefore wasn't due to the amount of traffic we were serving, but rather to something inherent in our code.
We reviewed our "theme," the underlying style information that places everything correctly on each page and gives the project its distinct look. Lo and behold, we came across a single missing instruction: something called $closure. Without $closure, pages didn't know when they were supposed to stop loading. When we had numerous users all trying to open pages using this broken theme, each user would retain an open line to our server, looking for the rest of the page. A very small spike in traffic would quickly clog the system, leading to very large increases in page loading times for everyone.
I don't know how long this problem has been going on, but I have a feeling it might have been a factor in many of the site outages we've experienced over the last several months. I hope this is the case, because I really want the problem to be fixed. We've been monitoring our load since implementing the fix on Saturday morning; our traffic level is normal, and we haven't had a single unusual spike. So I'm hopeful that we've truly achieved closure!