Recently a very good discussion took place on the UPHPU mailing list regarding MySQL and PostgreSQL (a.k.a. pg or pgsql), the two most popular open-source database engines, and when you might want to choose one or the other for your project. Many great insights were shared, and we’re sharing some of them here, in hopes that it will help others in the future.
For many people, MySQL vs PgSQL is more of a religious question than a technical one, and there is definitely something to be said for sticking to what is already familiar to you. There are also some very good technical considerations in choosing which one is more appropriate for your application. First and foremost, both MySQL and PG are excellent relational database management systems (RDBMS), and in almost every case, either one will work great, and there is no wrong answer when asking which one you should use.
Probably the biggest and most frequently discussed differences between MySQL and PG are in features and complexity. PostgreSQL has more of both. There are many features of SQL that PG has supported for years, that are just recently becoming available in MySQL. Some mentioned in the discussion are subselects, transactions, stored procedures, contstraints, triggers, and a few other advanced SQL features. Most or all of those features are (or will be) supported in MySQL 4.0, 4.1, or 5.0, and some, like transactions, were available in the MySQL 3.23.xx releases with certain table types, like InnoDB.
Because of the complexity added to PGSQL to support those features, MySQL in its default configuration (i.e. with MyISAM tables) is generally regarded as faster than PGSQL. Because of MySQL’s different table types (which use different database engines), you can effectively “turn on” and “turn off” features like transactions on a per-table basis, which provides a significant performance improvement. Many say that when using the advanced features of MySQL (i.e. when using InnoDB tables), performance is very comparable between the two databases.
There are virtues to having a less complex feature set: Simple is easier, Simple is faster, Simple is more stable. MySQL requires less system administration to configure and maintain, and for most people, it “just works” without any real tweaking or customization. Especially when being used with PHP, this is a big advantage. Most applications written in PHP simply do not require most of the advanced features that SQL has to offer, and PG would be overkill, and that overkill comes at a price.
Many people think that they would be better off with PGSQL because someday they might need a feature that PGSQL has that MySQL doesn’t have. As with most other things, if you don’t have something specific and reasonably likely in mind already, you most likely won’t ever use those features. Another important consideration is that there are very few cases where those advanced features are truly required. Granted, sometimes a subselect may make a simpler query, but it can almost always be rewritten as a join, which usually executes faster and is about as readable as the query using subselects. Especially in conjunction with PHP, it is very difficult to contrive a situation where the lack of a particular feature makes something impossible or very difficult with MySQL where it would be much easier with PostgreSQL.
Another aspect to consider is support. MySQL has a larger user base, and is supported by more applications. In cases where there isn’t support for both of them, an application is more likely to have MySQL support than PgSQL support. When both are supported, the MySQL support is generally better tested than the PgSQL support.
If you’re trying to help a client or customer or manager decide what database should be used, marketing and branding apply. If your client has heard of individual database engines, the most likely ones they’ll know about are Oracle, Microsoft SQL Server, and MySQL. Selling them on PostgreSQL can be an uphill battle if they haven’t heard of it before, though billing it as practically “an open-source Oracle” may do the trick, and is probably relatively accurate. PgSQL is the closest free and open-source equivalent to Oracle.
For others, licensing is an important consideration. PostgreSQL is released under a BSD-style license, which allows it to be used in almost unlimited ways, including embedding in closed-source applications. MySQL is released under the GPL and under a commercial licensing policy, which requires certain applications to pay for a license if their use doesn’t comply with the GPL. The main time this matters is if it will be embedded in a proprietary application.
Administrative overhead can be one of the biggest ongoing costs of your application, so careful consideration should be used before choosing something that will require more administration. MySQL is generally considered to require less administration than PostgreSQL.
If you’re already experienced with a complex database like Oracle, and like it, Postgre will be more similar to Oracle than MySQL will. If you haven’t used Oracle, or didn’t like using it, you’ll be more comfortable with MySQL in most cases.
Many people see performance (speed) and capacity (both volume of data and volume of queries) as key factors in choosing their database. In the vast majority of cases, any performance or capacity issues you might run into will not be solvable by changing database engines between MySQL and Postgres. It is probably 100x to 1000x easier to max out your database server by running poorly optimized queries, by having a poorly laid outdatabase schema, or by having a poorly written interaction between your application and your database, than to max out your database server by actually requiring an extremely high volume of data to go in or out. By the time you reach that point, you’re more likely to be running into problems like memory bandwidth, disk bandwidth, and seek times more than problems due to (or solvable by) your database engine.
Both MySQL and PostgreSQL will eventually “hit a wall” in their performance when you get enough queries. In general, a simpler, leaner engine, like MySQL, will hit that wall later than a more complex engine. Things like replication can help with performance and query capacity issues, but once you reach a high enough rate of inserts, deletes, and updates (“write” operations) in your database, replication won’t really help much. A “read” can occur at any database replica, but a “write” has to occur at all of them. If your performance problem is due to writes rather than reads, you’re better off getting a beefier server (within reason) than using replication.
If you’re not familiar with either of the two databases, or unfamiliar with SQL altogether, learning MySQL will probably be easier for you than learning PostgreSQL. If you already know Oracle, Postgres will be more familiar than MySQL. If your development team is more experienced with one or the other, that experience will probably provide more benefit than choosing a different engine. Retraining and working in something you aren’t familiar with can be somewhat costly, and usually should be avoided where possible.
There are a lot of considerations, but here’s the bottom line: Both MySQL and PostgreSQL are extremely good and capable database servers. There are a few differences that may sway you one way or the other, and there really isn’t a wrong choice. There are times when PostgreSQL is a clearly better choice for one reason or another, and in that case, don’t hesitate to use it. Keep in mind, however, that the times when PgSQL is necessary or is worth the extra complexity are far, far, fewer than the cases where MySQL is sufficient and more appropriate.
We at UPHPU hope that this article was useful to you, and we welcome your feedback. The content in this article is due to contributions by many people, and we would most notably like to thank Ray “bigdog” Hunter, Josh “dataw0lf” Simpson, Grant Shipley, Jon Jensen, Mike Trionfo, Tyler Gee, Jeffrey Moss, and Lonnie Olsen, and Scott Hill, who started the discussion.