I’ve been running my “T-Comm” Bus Locator website for over seven years now. As a quick recap, on the backend, it’s a procedural PHP/MySQL website, kind of a typical thing one would see of PHP development circa 2000s.
The high level overview is that every minute or two, the system polls TransLink’s real time information APIs for the locations of the all the buses in the system. The system then calculates some information based on TransLink’s GTFS data, which has the schedule information for the entire system. Then it saves the data into a MySQL database, and some other outputs for use on the web interface.
So naturally as more buses operate during the daytime (up to around a thousand buses), the CPU load increases during the day, then drops off at night (down to several on the Night Bus routes). It could take about two minutes to calculate all the necessary information to update all the buses during the daytime
Last night, I set off to find a quick way to optimize this, without having to rewrite the entire system (which arguably it sort of needs, but that’s for another day). From a cursory at the system resources using tools such as htop, it was pretty obvious that the bottleneck was on the MySQL database and its reading and writing to the relatively slow hard disks.
Continue Reading