Reducing Database Load using Redis and Batched Insert/Update SQL Queries

I’ve been running my “T-Comm” Bus Locator website for over seven years now. As a quick recap, on the backend, it’s a procedural PHP/MySQL website, kind of a typical thing one would see of PHP development circa 2000s.

The high level overview is that every minute or two, the system polls TransLink’s real time information APIs for the locations of the all the buses in the system. The system then calculates some information based on TransLink’s GTFS data, which has the schedule information for the entire system. Then it saves the data into a MySQL database, and some other outputs for use on the web interface.

So naturally as more buses operate during the daytime (up to around a thousand buses), the CPU load increases during the day, then drops off at night (down to several on the Night Bus routes). It could take about two minutes to calculate all the necessary information to update all the buses during the daytime

CPU load over the last week

Last night, I set off to find a quick way to optimize this, without having to rewrite the entire system (which arguably it sort of needs, but that’s for another day). From a cursory at the system resources using tools such as htop, it was pretty obvious that the bottleneck was on the MySQL database and its reading and writing to the relatively slow hard disks.

Continue Reading

Checking for a Null value in SQL

Yesterday I fell into a very simple but dangerous SQL trap.  Can you identify it?

SELECT first_column, another_column
FROM my_table
WHERE first_column = 1
AND (another_column >= 100 OR
another_column = null)

Found it?  It’s the comparison to null.  My intention was to get rows where another_column was null, however the query did not do so.  It didn’t give me any results where another_column was null, but didn’t give me any error either.  Why?

Null in SQL is not a real value.  Null means “lack of a value” or “unknown”. Anything compared to null leads to an “unknown” result. Therefore, in SQL there are actually three values conditions can evaluate to: true, false, and unknown/null. This is called three-valued logic. SQL WHERE clauses only return rows that evaluate to true, not rows that evaluate to false or “unknown”.

The proper way of checking for null is by using the IS NULL or IS NOT NULL predicates. The correct SQL query would have been as follows:

SELECT first_column, another_column
FROM my_table
WHERE first_column = 1
AND (another_column >= 100 OR
another_column IS null)

So if you’re used to languages where you check for null with equal signs, remember that in SQL, you must use the special IS NULL or IS NOT NULL predicates instead of the equals sign or other comparison operators. Don’t get tripped up like I did.

If you want to read more about, Wikipedia has a detailed article on null in SQL.