I’ve been running my “T-Comm” Bus Locator website for over seven years now. As a quick recap, on the backend, it’s a procedural PHP/MySQL website, kind of a typical thing one would see of PHP development circa 2000s.
The high level overview is that every minute or two, the system polls TransLink’s real time information APIs for the locations of the all the buses in the system. The system then calculates some information based on TransLink’s GTFS data, which has the schedule information for the entire system. Then it saves the data into a MySQL database, and some other outputs for use on the web interface.
So naturally as more buses operate during the daytime (up to around a thousand buses), the CPU load increases during the day, then drops off at night (down to several on the Night Bus routes). It could take about two minutes to calculate all the necessary information to update all the buses during the daytime
CPU load over the last week
Last night, I set off to find a quick way to optimize this, without having to rewrite the entire system (which arguably it sort of needs, but that’s for another day). From a cursory at the system resources using tools such as htop, it was pretty obvious that the bottleneck was on the MySQL database and its reading and writing to the relatively slow hard disks.
Yesterday I fell into a very simple but dangerous SQL trap. Can you identify it?
SELECT first_column, another_column
FROM my_table
WHERE first_column = 1
AND (another_column >= 100 OR
another_column = null)
Found it? It’s the comparison to null. My intention was to get rows where another_column was null, however the query did not do so. It didn’t give me any results where another_column was null, but didn’t give me any error either. Why?
Null in SQL is not a real value. Null means “lack of a value” or “unknown”. Anything compared to null leads to an “unknown” result. Therefore, in SQL there are actually three values conditions can evaluate to: true, false, and unknown/null. This is called three-valued logic. SQL WHERE clauses only return rows that evaluate to true, not rows that evaluate to false or “unknown”.
The proper way of checking for null is by using the IS NULL or IS NOT NULL predicates. The correct SQL query would have been as follows:
SELECT first_column, another_column
FROM my_table
WHERE first_column = 1
AND (another_column >= 100 OR
another_column IS null)
So if you’re used to languages where you check for null with equal signs, remember that in SQL, you must use the special IS NULL or IS NOT NULL predicates instead of the equals sign or other comparison operators. Don’t get tripped up like I did.
If you want to read more about, Wikipedia has a detailed article on null in SQL.
Websites can be slow for many reasons, but an overwhelmingly common one is time spent performing database queries. Now database queries are likely a very important part of rendering the webpage – the information to populate a page has to come from somewhere. But sometimes rendering a page is slowed down by either unnecessary queries or improperly optimized queries.
I completely agree with his view. For most web developers, querying information from a database is quite a common action. Therefore, I believe learning how to write efficient and optimized database queries is a good skill to have.
I was fortunate to learn about this topic when I was developing MyBB way back years ago. At the bottom of each MyBB page, administrators are shown debugging info with a summary of loading times and other statistics.
Debugging information presented to the administrator on each MyBB page
Clicking on the “advanced details” link yields a much more in-depth analysis of the page generation, including an explanation of all the database queries used to create the page.
Advanced performance debugging details presented to MyBB administrators.
This is very useful to determine where the bottlenecks occur when generating the page. Some common mistakes include repeating similar queries instead of batching them together, selecting more data than is needed, and doing full table scans rather than indexed queries.
However, not all software and web frameworks has built-in performance debugging pages like what MyBB has. If you’re programming something from scratch, it might take some extra effort to create a useful debugging interface. What other tools can we use then?
Using the EXPLAIN query (works at least for MySQL, PostgreSQL, SQLite) will give you the details about how a specific query will be executed by the database. All you need to do is to tack “EXPLAIN” on at the beginning of your SELECT query, and run it through your favourite database management program. SitePoint has a thorough article on how to understand the output of an EXPLAIN query.
Of course, it’s also beneficial to have a concrete understanding of database indexes and how they work. One of my favourite sites on this topic is Use the Index, Luke! They also have a fun 3-minute quiz to test your knowledge, if you think you already know everything 😉 (I got 4/5 on my first try, how about you? 🙂 )
Do you know of any other resources to help with optimizing database queries? Let me know in the comments.
Ever since MyBB had a facility to create database backups, we have had questions on why there wasn’t a similar facility to import backups. The following reasons were taken into account while designing this feature:
1. Security
Reason
If a user somehow managed to get into an administrative account with the correct permissions, they could potentially revert your forum to an earlier state, or run arbitrary SQL queries on your database.
The Alternative
The use of an externally-installed database administration application reduces the chances that a malicious user can perform these actions on your database through MyBB.
2. Focus
Reason
We are developing a bulletin board system. We are focused on bringing you the features that will help run your bulletin board. Although database backup and restore are important tools, they are not used on a day-to-day basis in your forum’s administration. The backup function has more frequent use than the restore anyway. Our development resources are limited and we do not have the capacity to develop a fully-featured database administration program into the bulletin board software.
The Alternative
Third-party database administration programs are specifically developed for the purpose of managing your database. Their features are more robust than what we can provide. Just like how we concentrate on discussion board features, their development teams concentrate on what their product is: database administration.
Weight
Reason
In order to keep the MyBB package a reasonable size, we have to be conscious of what features are implemented. It is a waste of both physical and human resources to have a large package (clunky to install and update) with a steep learning curve (difficulty of usage).
The Alternative
Dedicated database administration solutions have the ability to incorporate more useful administrative features because that is their focus.
Reliability
Reason
In the hypothetical case where an import facility is indeed incorporated into MyBB, the requirement of using it is a working version of MyBB. Let’s say you just upgraded to MyBB 1.4, but you do not like it. Your database backup is 30MB. You start restoring it via the MyBB 1.4 Admin CP, but the restore times out part way. Your database is now half MyBB 1.2.x and half MyBB 1.4.x and neither version will operate correctly. You will need to resort to a third-party database administration program anyway to complete the process of reverting. A doctor can’t heal himself if he is unconscious.
The Alternative
A third-party database administration program does not depend on anything you are operating on. Even if an operation times out, this does not affect the program, you will be able to use the same program to correct the problem.
I hope this sheds some light on the rationale behind not including a database import feature.