This entry is part 1 of 5 in the Building a Database Abstraction Layer Series
- Custom Database API – Reasons for Custom Tables and an API
- Custom Database API – Structuring Your Data
- Custom Database API – Creating the Tables
- Custom Database API – The Basic API Class
- Custom Database API – A Sample Orders Class
The WordPress database class and the other various APIs available make it pretty easy to interact with the WordPress database but often times you need to go a step further and create a custom API specific to your plugin’s database tables. Large plugins especially can benefit from having a custom database API as it standardizes common tasks, making them more reliable, more repeatable, and simpler to debug.
There used to be a mentality among WordPress plugin developers that you should never use custom tables in your plugin because it wasn’t the “WordPress” way. Let’s start by throwing that attitude out the window.
When it comes to storing large amounts of data that does not very closely mimic existing WordPress database schemas, you should absolutely use custom tables. Choosing not to use a custom table will likely cause more harm than good. While it’s possible to store almost anything you want as a custom post type in the wp_posts table, that does not mean you should or that it is even a remotely good idea.
Custom tables give us the flexibility store data in the way that we choose, which in turn allows us to build highly efficient data storage and retrieval mechanisms.
Let’s compare and contrast for a moment the difference between using wp_posts/wp_postmeta to store data and using a custom table tailored to our needs. For this example, assume that we are storing order information for an eCommerce website.
For each order record, we need to store the following information (not a conclusive list):
- Total
- Taxes
- Discounts
- Billing Address Line 1
- Billing Address Line 2
- Billing City
- Billing State / Province
- Billing Country
- Billing Postal Code
- Shipping Address Line 1
- Shipping Address Line 2
- Shipping City
- Shipping State / Province
- Shipping Country
- Shipping Postal Code
- Customer ID (customers stored in separate table)
- Status
- Purchase Method (PayPal, Stripe, etc)
- IP Address
- Date
That’s a lot of information and it really only touches on the basics. Along with each order are other records that need to be stored. The products that were purchased (usually stored separately and then related to the order by an order ID), the customer’s personal information, access logs, refunds, and more.
So how does this information get stored when using wp_posts/wp_postmeta? On the surface, it’s very straight forward:
- Create a custom post type called “orders”
- Create a post type entry for each order
- Store each of the items above in postmeta
Seems pretty great, right? In some ways it is. For example, retrieving a list of orders is as simple as calling WP_Query and get_post_meta(). The data is accessible and you rarely have to worry about caching or security because the built-in WordPress APIs handle it all for you. Updating data is also exceptionally simple, thanks to wp_update_post() and update_post_meta().
The ease of which WordPress makes storing and retrieving data in/from the posts/postmeta table is one of WordPress’s greatest advantages for plugin developers, but it’s also a giant crutch. We tend to get in the habit of just assuming it is a good idea to store data there because it is so damn easy to store data there.
If it’s so easy to store data in wp_posts and wp_postmeta, why is it bad? There are numerous reasons, but let’s look at a couple of the main points.
Reason number 1: efficiency
First, it’s horribly inefficient to store data like this in wp_posts/wp_postmeta. In a database table tailored specifically to storing eCommerce order data, we would see all of the data above stored as a single row, but in wp_posts and wp_postmeta, it actually gets stored as 20+ individual rows across two different tables. There is one record in wp_posts that holds the main order record and then 19+ individual rows in wp_postmeta for each piece of metadata.
Why would we ever opt to store data in tables that forced us to have 20+ different rows for a “single” record? That’s asinine.
Reason number 2: inefficiency
Storing data that needs to support calculation queries is horribly inefficient. Due to a single order taking 20+ rows in the database, it’s exceptionally difficult to perform calculation queries on the data.
Let’s say that you want to know how much revenue the store has made during July, 2015. In an optimal database schema, the query might look something like this:
SELECT SUM(total) FROM orders WHERE status='complete' AND 7 = MONTH ( date ) AND 2015 = YEAR (date);
Now let’s see that same query when the metadata is stored in post meta:
SELECT SUM( m.meta_value ) FROM wp_postmeta m LEFT JOIN wp_posts p ON m.post_id = p.ID WHERE p.post_status = 'complete' AND m.meta_key = '_order_total' AND 7 = MONTH (p.post_date) AND 2015 = YEAR (p.post_date);
In order to get a sum of the order totals, we have to use a JOIN to get only the meta values that correspond to completed orders.
This isn’t all that bad, but it certainly isn’t great. The first query is magnitudes simpler and a lot more efficient.
Note: the query I’ve shown is a relatively simple query. As calculations get more complex, the queries to data stored in wp_posts and wp_postmeta get exponentially worse.
Reason number 3: private data
Inherently, the wp_posts and wp_postmeta tables are designed for public data. WordPress was built as a blogging platform so the data stored in its tables is typically non-sensitive publicly-safe data. There are certain areas of the WordPress database designed to hold sensitive data, but generally the posts and postmeta table are not suited for this kind of data.
As developers, we take on a certain level of risk when we opt into storing private and sensitive data in a table that is inherently designed for public data.
With over 50,000 plugins available for WordPress, it’s impossible for us to know what other plugins may do with the data that we store in wp_posts and wp_postmeta. It’s not terribly uncommon to hear about a plugin exposing post/postmeta data to the world. If the data is confidential (such as customer emails and addresses), this could cause uncomfortable problems.
By storing data in custom-designed tables using our own abstraction layers, we can be far more confident that the data is safe and not being manipulated or accessed by other plugins running on the website.
Reason number 4: control
When building within an existing schema, you are limited to where and how you store data. By utilizing custom tables and your own API for accessing the data, you have complete freedom to store and access data how you wish. This can provide huge amounts of flexibility, though it should also be mentioned that there are certain responsibilities you also take on when you build your own tables and API.
Security. It is critically important that you take security seriously and ensure that your API is not susceptible to SQL injection, timing attacks, cross-site scripting and other attack methods.
Caching. WordPress provides caching for nearly all of its internal database methods. When you build your own API, you usually will not get to lean on the caching WordPress core already has, so you will have to include it yourself.
Maintenance. It is entirely up to you to ensure that your API remains stable.
Reason 5: back to efficiency
The inefficiency of storing data in tables that are not meant to store that kind of data cannot be repeated enough.
Let’s look at the columns of the wp_posts table for a moment:
- ID
- post_author
- post_date
- post_date_gmt
- post_content
- post_title
- post_excerpt
- post_status
- comment_status
- ping_status
- post_password
- post_name
- to_ping
- pinged
- post_modified
- post_modified_gmt
- post_content_filtered
- post_parent
- guid
- menu_order
- post_type
- post_mime_type
- comment_count
Those are the columns we have available to us to drop data into when using a custom post type. Since we’re using eCommerce data as our example, let’s see which columns we might use and what we will put in them.
- ID – This could be our order ID. If we need sequential order numbers, however, we’ll have to have a separate postmeta row for it.
- post_author – If our customers have user accounts, this could be the customer’s user ID. Not used for guest purchases.
- post_date – The date of the order.
- post_date_gmt – The date of the order in GMT.
- post_content – Order notes perhaps?
- post_title – Perhaps the customer name or email???
- post_excerpt – Another option for order notes?
- post_status – The status of the order.
- comment_status
- ping_status
- post_password
- post_name
- to_ping
- pinged
- post_modified – Last time order was modified.
- post_modified_gmt – Last time order was modified in GMT
- post_content_filtered
- post_parent
- guid
- menu_order
- post_type – “order” post type
- post_mime_type
- comment_count
Notice that there are a lot of columns that are “maybes” and even more columns that are simply not used. There are 12 columns that have absolutely no purpose. They sit there and take up space. Of the columns we do use, only 6 are used for their actual purpose in WordPress.
When storing 20+ pieces of information for each order, why would we opt to store them in a table that only has 6 semi-useful rows to us, forcing us to dump the rest into a metadata table? It’s just horribly inefficient, especially considering that there are much more superior options, and that’s where a custom table comes in.
While there are some great benefits to using the standard WordPress database schema and APIs to store data, when it comes to storing big data (such as eCommerce data), the costs severely out weigh the benefits.
In the next parts of this tutorial series, we will work through the process of building an API to create, manage, and utilize custom tables to store large amounts of data.
Great insights into custom tables. Thanks for it!
Maybe you could add that requesting data from two and more tables is not only “less efficient” than using joins, but rather often *slower* than accessing the same data set from only one table. I think that makes the performance aspect more clear.
As efficiency is not the same like performance it could be worth to mention the performance boost of using a custom table explicitly.
Great stuff! Looking forward to the next article of series.
When will the next article come?
Hopefully sometime this week,
Sweet! Do you have other articles on what is the best and secured way to save data in a costum table?
If you give me a few more details about what you’re unsure of, I can try and link you to some helpful tutorials.
Thanks! I need to develop an site and we are the best way of handling the data is to do it my one table structure. But I like to know what is the best practice and secure way for writing and getting data from a table. Maybe WordPress have API for this. But I don’t know. Please help 🙂
You will want to use the $wpdb class (something we will get to in this tutorial series): https://codex.wordpress.org/Class_Reference/wpdb
Let me know if that helps!
Great introduction. The custom tables offer efficiency and performancee but come with a static data model. I can see why the post metas are still a great choice for the CPT. It basically offers high flexibility. I guess that’s the main reason why many CRM follow the same data model principals. I guess it’s always about using the best tool for the job.
There’s a LOT of information that is perfectly fine for postmeta, it’s just the big data sets that really don’t fit in posts that I’m referring to here.
Totally agree ! I don’t see why it would be a bad thing to create custom table to store data. There’s a database we do what we do. The only thing we have to provide is a good structure and some cleaning if the plugin is uninstalled.
I like plugins that create their custom table when necessary.
But I was not aware the post meta were designed to be public. Actually if you look into the brand new REST API you can’t access to this data whithout logging in.
Looking forward to read the newt part :=)
Some postmeta is designed to be public. Any meta field prefixed with an underscore, such as “_my_met_key”, is considered private. One of the issues is that there are some plugins and themes that do not respect this.
Also FYI I noticed this rather similar series of articles: http://code.tutsplus.com/series/custom-database-tables–wp-33839
(Admittedly it’s by a competitor of mine, Whatever)
That’s actually one of the tutorial I series I used for writing my first DB API 🙂
Idea: When deciding whether to use a custom table, write out all the types of queries you’ll use first. If they map well to a CPT, use that. If not use a custom table. And think about multisite too.
Definitely.
Multisite is so often overlooked with custom tables.
That’s assuming you know exactly how the plugin/app will be used. Typically, it evolves and new functionality that was not anticipated gets added, and you might find out you need to move from a CPT to custom tables. Which can be challenging.
This is the best explanation I’ve seen so far explaining why you should (or shouldn’t) use custom tables.
Earlier this year I did some testing with alternative databases too, for data which needs updated extremely frequently. Huge performance advantages can be obtained by just switching to a different database entirely (WordPress obviously itself still needs to stay on MySQL of course).
Did you use a different database server as well?
What I like is that you’ve got the experience to back this. I’m sure with the various plugins you’ve written you’ve made choices you came to regret and later needed to move in a different direction.
Those migration strategies are pretty tricky, so the idea of trying map out some of the potential use cases for both methods seems like a great place to start.
Having had those experiences has been tremendously valuable, even if regrettably so sometimes.
Great. Thanks Pippin.
fyi – It would be nice if there was some sort of industry standard (e.g., naming convention) for custom tables and associated fields / columns.
Great stuff Pippin, great stuff. You explained the EAV table structure problem of wp postmeta very well. This will help a lot of people.
Thanks.
Just read this and i must say, it was a great read.
Hello. I went through this too and discovered idiorm (GitHub). Pretty simple to use and it sticks to some WP Coding Standards 😉 So the code sniffer won’t argue about it.
As our plugin, CSS Hero, is storing a lot of informations on the database, we opted for going 100% your way: custom tables.
Way more efficient and easier to troubleshoot. Much better.
If I would have read your post before, we would have saved many cpu cycles 🙂
You really debunked a mith to me with this post. Thanks.
One other point to add to the list of criteria is search. With CPT your item can be returned as part of the WP search. Without, it’s pretty difficult (I’ve not yet found a way that works well anyway).
Very good point to consider. If the information you’re storing in the database is public (such as testimonials, job postings, or something us public facing), ensuring that information can be searched will be important. It will also be import to consider how well it will be supported by plugins that enhance WordPress search, such as SearchWP.
This is a timely topic worthy of much scrutiny and discussion.
I’m all for my own tables and intend to do so but I’m thinking there is going to be redundant data and felonious violations of normalization practices.
the problem of wp post meta you have explained in this article is really nice. I hope it will help lot of people here who will read this too.