This entry is part 1 of 5 in the Building a Database Abstraction Layer Series
The WordPress database class and the other various APIs available make it pretty easy to interact with the WordPress database but often times you need to go a step further and create a custom API specific to your plugin’s database tables. Large plugins especially can benefit from having a custom database API as it standardizes common tasks, making them more reliable, more repeatable, and simpler to debug.
There used to be a mentality among WordPress plugin developers that you should never use custom tables in your plugin because it wasn’t the “WordPress” way. Let’s start by throwing that attitude out the window.
When it comes to storing large amounts of data that does not very closely mimic existing WordPress database schemas, you should absolutely use custom tables. Choosing not to use a custom table will likely cause more harm than good. While it’s possible to store almost anything you want as a custom post type in the wp_posts table, that does not mean you should or that it is even a remotely good idea.
Custom tables give us the flexibility store data in the way that we choose, which in turn allows us to build highly efficient data storage and retrieval mechanisms.
Let’s compare and contrast for a moment the difference between using wp_posts/wp_postmeta to store data and using a custom table tailored to our needs. For this example, assume that we are storing order information for an eCommerce website.
For each order record, we need to store the following information (not a conclusive list):
- Billing Address Line 1
- Billing Address Line 2
- Billing City
- Billing State / Province
- Billing Country
- Billing Postal Code
- Shipping Address Line 1
- Shipping Address Line 2
- Shipping City
- Shipping State / Province
- Shipping Country
- Shipping Postal Code
- Customer ID (customers stored in separate table)
- Purchase Method (PayPal, Stripe, etc)
- IP Address
That’s a lot of information and it really only touches on the basics. Along with each order are other records that need to be stored. The products that were purchased (usually stored separately and then related to the order by an order ID), the customer’s personal information, access logs, refunds, and more.
So how does this information get stored when using wp_posts/wp_postmeta? On the surface, it’s very straight forward:
- Create a custom post type called “orders”
- Create a post type entry for each order
- Store each of the items above in postmeta
Seems pretty great, right? In some ways it is. For example, retrieving a list of orders is as simple as calling WP_Query and get_post_meta(). The data is accessible and you rarely have to worry about caching or security because the built-in WordPress APIs handle it all for you. Updating data is also exceptionally simple, thanks to wp_update_post() and update_post_meta().
The ease of which WordPress makes storing and retrieving data in/from the posts/postmeta table is one of WordPress’s greatest advantages for plugin developers, but it’s also a giant crutch. We tend to get in the habit of just assuming it is a good idea to store data there because it is so damn easy to store data there.
If it’s so easy to store data in wp_posts and wp_postmeta, why is it bad? There are numerous reasons, but let’s look at a couple of the main points.
Reason number 1: efficiency
First, it’s horribly inefficient to store data like this in wp_posts/wp_postmeta. In a database table tailored specifically to storing eCommerce order data, we would see all of the data above stored as a single row, but in wp_posts and wp_postmeta, it actually gets stored as 20+ individual rows across two different tables. There is one record in wp_posts that holds the main order record and then 19+ individual rows in wp_postmeta for each piece of metadata.
Why would we ever opt to store data in tables that forced us to have 20+ different rows for a “single” record? That’s asinine.
Reason number 2: inefficiency
Storing data that needs to support calculation queries is horribly inefficient. Due to a single order taking 20+ rows in the database, it’s exceptionally difficult to perform calculation queries on the data.
Let’s say that you want to know how much revenue the store has made during July, 2015. In an optimal database schema, the query might look something like this:
SELECT SUM(total) FROM orders WHERE status='complete' AND 7 = MONTH ( date ) AND 2015 = YEAR (date);
Now let’s see that same query when the metadata is stored in post meta:
SELECT SUM( m.meta_value ) FROM wp_postmeta m LEFT JOIN wp_posts p ON m.post_id = p.ID WHERE p.post_status = 'complete' AND m.meta_key = '_order_total' AND 7 = MONTH (p.post_date) AND 2015 = YEAR (p.post_date);
In order to get a sum of the order totals, we have to use a JOIN to get only the meta values that correspond to completed orders.
This isn’t all that bad, but it certainly isn’t great. The first query is magnitudes simpler and a lot more efficient.
Note: the query I’ve shown is a relatively simple query. As calculations get more complex, the queries to data stored in wp_posts and wp_postmeta get exponentially worse.
Reason number 3: private data
Inherently, the wp_posts and wp_postmeta tables are designed for public data. WordPress was built as a blogging platform so the data stored in its tables is typically non-sensitive publicly-safe data. There are certain areas of the WordPress database designed to hold sensitive data, but generally the posts and postmeta table are not suited for this kind of data.
As developers, we take on a certain level of risk when we opt into storing private and sensitive data in a table that is inherently designed for public data.
With over 50,000 plugins available for WordPress, it’s impossible for us to know what other plugins may do with the data that we store in wp_posts and wp_postmeta. It’s not terribly uncommon to hear about a plugin exposing post/postmeta data to the world. If the data is confidential (such as customer emails and addresses), this could cause uncomfortable problems.
By storing data in custom-designed tables using our own abstraction layers, we can be far more confident that the data is safe and not being manipulated or accessed by other plugins running on the website.
Reason number 4: control
When building within an existing schema, you are limited to where and how you store data. By utilizing custom tables and your own API for accessing the data, you have complete freedom to store and access data how you wish. This can provide huge amounts of flexibility, though it should also be mentioned that there are certain responsibilities you also take on when you build your own tables and API.
Security. It is critically important that you take security seriously and ensure that your API is not susceptible to SQL injection, timing attacks, cross-site scripting and other attack methods.
Caching. WordPress provides caching for nearly all of its internal database methods. When you build your own API, you usually will not get to lean on the caching WordPress core already has, so you will have to include it yourself.
Maintenance. It is entirely up to you to ensure that your API remains stable.
Reason 5: back to efficiency
The inefficiency of storing data in tables that are not meant to store that kind of data cannot be repeated enough.
Let’s look at the columns of the wp_posts table for a moment:
Those are the columns we have available to us to drop data into when using a custom post type. Since we’re using eCommerce data as our example, let’s see which columns we might use and what we will put in them.
- ID – This could be our order ID. If we need sequential order numbers, however, we’ll have to have a separate postmeta row for it.
- post_author – If our customers have user accounts, this could be the customer’s user ID. Not used for guest purchases.
- post_date – The date of the order.
- post_date_gmt – The date of the order in GMT.
- post_content – Order notes perhaps?
- post_title – Perhaps the customer name or email???
- post_excerpt – Another option for order notes?
- post_status – The status of the order.
- post_modified – Last time order was modified.
- post_modified_gmt – Last time order was modified in GMT
- post_type – “order” post type
Notice that there are a lot of columns that are “maybes” and even more columns that are simply not used. There are 12 columns that have absolutely no purpose. They sit there and take up space. Of the columns we do use, only 6 are used for their actual purpose in WordPress.
When storing 20+ pieces of information for each order, why would we opt to store them in a table that only has 6 semi-useful rows to us, forcing us to dump the rest into a metadata table? It’s just horribly inefficient, especially considering that there are much more superior options, and that’s where a custom table comes in.
While there are some great benefits to using the standard WordPress database schema and APIs to store data, when it comes to storing big data (such as eCommerce data), the costs severely out weigh the benefits.
In the next parts of this tutorial series, we will work through the process of building an API to create, manage, and utilize custom tables to store large amounts of data.