The monster that is a poor database schema

Step back in time two, three, four, or even 10 years and take a look at the development decisions you made then. What do you notice about them? Unless you are a one-in-a-million statistic, you probably look at those past decisions and say to yourself what was I thinking?! Why did I do it that way?! Welcome to the real world of actual development.

As developers, we grow and learn over time; we get better at making design pattern decisions; we get better at writing performant code; we get better at all aspects of development.

Take a look at any project that has been around for a number of years and you will find gremlins hiding in its shadows and crevices. There will be internal APIs that are convoluted; there will be data structures that make zero logical sense; there will be function names that seem asinine; there will be blatant problems and it will appear that these are the results of poorly made development decisions. While this is sometimes true, it is far more likely that these gremlins are actually the result of inexperience that leads to a lack of foresight and understanding of the future consequences of non-well-thought-out designs.

Smooth resolutions of bad design patterns

Imagine a project that begins as a small, internal system for doing one thing and only one thing, and imagine it as a project you build specifically for yourself. Due to the nature of it being a small, personal project, it is likely that you will take short cuts; it is likely you will make some decisions simply because Y provided a quicker solution than X; it is likely that you will name variables or API methods poorly; and it is guaranteed that you will make some decisions that have a severely negative impact on your small, personal project four years later when that project has grown far beyond a simple, personal project.

This is the reality of the real development world and the truth for all projects that grow over time. Poor data schemas and difficult APIs are the skeletons in our closets, the spider webs behind our furniture, and the ghosts in our machines. They exist in every project and are a natural product of development growth.

The real achievement is not in building a project with zero gremlins, it is learning how to get past those weaknesses and poor decisions in a smooth way that has little to no negative impact on the users of the project.

Let’s go back to the imaginary project above for a moment. Assume that when first building that project you made the decision to store large amounts of data in a poorly designed database schema, or perhaps even a database with zero design schema that applies to your project. At the time this database schema worked fine because it was easy and, after all, it was only you using the project, so who really cares? Now fast-forward four years and imagine that your project is now used by over 50,000 websites and tens of thousands of users and hundreds or even thousands of developers. Each of these users makes use of the project in a slightly different way and each of the developers builds new tools on top of the project. At this point those poor design decisions (or perhaps even the complete lack of a “design” decision) begin to have negative effects on the project by reeling their ugly heads and presenting your users with severe limitations and scaling issues.

Bad data schema designs can result in severe performance issues. Poor API design can make it difficult for other developers to use or extend the project. Poorly thought out relationships within your code and your database can become the elephant in the room that no one wants to talk about but are abundantly clear and really start to get in the way.

The question that all developers need to ask at some point is this: how do we get past the design decisions of the past so we can continue to grow and excel in the future?

This is precisely what my team and I are working on for Easy Digital Downloads now.

Quick and easy in the beginning

Four years ago, when Easy Digital Downloads was brand new, I made some poor design decisions related to the database structure used in the plugin. Relationships between various pieces of information stored by the plugin were created haphazardly and we chose to rely on the data structures provided by WordPress core. This means that all of our eCommerce data (payment records, order items, order meta data, customers, etc) were stored in the wp_posts and wp_postmeta table. At the time this worked fine. It was easy, quick, and more than flexible enough for what we needed. What I failed to see, however, was just how cumbersome storing eCommerce data in the core posts table was going to be once the plugin scaled up to a lot of users and large websites processing significant sales volumes.

The decision to use wp_posts and wp_postmeta for our eCommerce data is a decision I regret and one that has created significant challenges for us, but none of the challenges are so significant that we cannot get past them.

When faced with the reality of bad data schemas, there are really two ways to address the problem:

Simply live it with and do what you can to mitigate the problems
Work out a plan for resolving the problem completely by re-building the data schema from the ground up

The first option is the easier of the two for many reasons. First, it requires the least amount of change. Second, it avoids the significant risk of severely breaking backwards compatibility. Third, it does not require any cooperation with third party developers that have built on top of your bad data schema.

Option two, however, can be much better for the health of the project in the long run. It does, however, present a serious risk to the project’s health and continued adoption by users and developers. When making significant changes, backwards compatibility must be kept an absolute priority. If backwards compatibility is ignored or implemented poorly, it’s very possible that the migration process to the new and improved schema will go poorly, resulting in developers and users abandoning future reliance on the project. It can also cause a support nightmare if hundreds or thousands of websites are suddenly breaking due to the changes implemented.

Building a backwards compatibility layer can be challenging, but it is a challenge that will be worth it in the end. Frankly, I would go as far as to say you should not even consider resolving a bad database schema if you do not plan to also introduce and maintain a complete backwards compatibility layer. Choosing to ignore backwards compatibility in a scenario like this is negligent and harmful to your users. With that in mind, how does one go about building a backwards compatibility layer? There are really a few parts of it.

Abstraction layers for backwards compatibility

The very first step in providing backwards compatibility is to ensure there is an abstraction layer for your database. An abstraction layer is simply an API for interacting with the database. It provides developers standardized methods for reading and writing to the database without writing actual queries. For example, WP_Query is an abstraction layer for the wp_posts table that provides methods for querying data from the posts database without writing any actual SQL. Why is this valuable? There are numerous reasons but for this particular discussion, it provides project maintainers the ability to change the database schema without disrupting external projects that utilize the data.

In Easy Digital Downloads, we have built abstraction layers for payments, customers, and products. These abstraction layers are fundamentally important when it comes time to change the underlying database structure.

Let’s look at a quick example.

Assume we wish to retrieve the first and last name of a customer record. In the current version of Easy Digital Downloads, both the first and last name are stored in a single column in the database, but perhaps in a future version we decide to separate them into two columns. Through the EDD_Customer object, getting the name of the customer is simple:

$customer = new EDD_Customer( 47 );
echo $customer->name;

That will output the customer’s full name, such as Elizabeth Johnston.

Where’s the value in this abstraction layer? well, it becomes very apparent (at a simple level) when we consider the following possibility.

Assume now that the EDD_Customer object was not originally available so a third party developer decides to directly query the database for the customer’s name:

echo $wpdb->get_var( "SELECT name FROM edd_customers where id = 47 LIMIT 1;" );

Since storing both the first and last name in a single column was probably a poor decision, we later on decide to separate the names into two columns, first_name and last_name. In this scenario, the first example, which relies on the abstraction layer of EDD_Customer, will continue to function exactly as is. The second example, however, will suddenly fail because the name column no longer exists.

This is a simple example but it does accurately illustrate the importance of having abstraction layers. Consider now how important it will be when you’re preparing to change not only a single column in the database but the entire database. Every single column. Without a proper abstraction layer, making that transition will be nearly impossible.

If an abstraction layer isn’t already present, build one immediately. That’s the very first step anytime a database schema needs to be changed.

After you have an abstraction layer in place, you need to work hard to ensure that everyone uses it. If a platform has been around for a while, it will be necessary to ~~push and shove~~ work hard to encourage developers to update their code to use the abstraction layer. This is something we’ve begun to do for the recent introduction of EDD_Payment.

With the creation and adoption of a good abstraction layer, the process of migrating to a good database schema becomes a lot simpler, though it is still a very, very significant task that has a lot of challenges. For example: how does a project maintainer account for all of those developers that ignored or simply didn’t see the news about the abstraction layer? Or how about all of the project’s users that did not update to the latest versions? For those, the best one can do is provide as much backwards compatibility as possible.

For Easy Digital Downloads, building a backwards compatibility layer will involve a number of factors. First, we will have to intercept and re-route every single call to get_post_meta() that is made against all EDD payment metadata. Thankfully, the WordPress metadata API includes number filters and action hooks that make this possible. Second, we will have to intercept and re-route every query to the wp_posts table that contains the edd_payment post type. Again, the prevalence of filters in the WordPress core codebase will provide ample ways for us to do this. Third, we will also have to intercept and re-route every write and deletion to the wp_posts and wp_postmeta tables for all EDD-related queries.

Slow and careful

This kind of migration process takes a long time and needs to be executed with extreme care. We will likely spend 6-12 months building this backwards compatibility layer. The most significant challenge for it will not be writing or handling the re-routing of queries; the real challenge will be finding and knowing all of the data points that we need to include. For example, we know very well what all of the meta_key values are that we use in Easy Digital Downloads and all of the officially maintained extensions. What we don’t know, however, is the meta_keys that third party developers have used in their own extension. There are some assumptions we can make, such as assuming that any meta_key containing “edd_” belongs an EDD plugin, but we’ll never be able to cover 100% of the data out there.

In the end, there should be several goals in defeating the monster that is a bad database schema:

Introduce a new and well thought-out schema that resolves all problems the original schema created
Introduce and maintain complete abstraction layers for the database schemas so that future changes are less difficult
Make the transition from old to new schemas as smooth and invisible as possible
Protect the user base that does not have the luxury of updating or is simply unaware of updates by providing complete backwards compatibility

There is no reason poor database schemas cannot be improved, they just have to be done so slowly and with great care.

Note: would you like to learn how to build a database abstraction layer or read more about the reasons for why you should use custom tables in WordPress? I have a complete tutorial series on the subject.

maurisrx
May 17, 2016

Great article Pippin! It gives me a new insight as newbie developer on how I should tackle database schema.

Almost all plugins I’ve developed use default posts and postmeta tables to store data because I think my plugins will only store small amount of data and it’s easier and faster to develop since there is already available WP API.
Joan Boluda
May 17, 2016

Love it! Thanks so much for keeping us informed! This is really the way to go. I’m sure the upgrade routine will be awesome, and in the long run, this will benefit ALL of us. If you need beta testers, let us know! 🙂

BTW, I really love this kind of posts, thanks Pippin!
Daniel Iser
May 17, 2016

Just went through this for my plugin Popup Maker, pretty much spot on.

We also changed naming conventions which meant we had to reroute old hooks and filters to new ones as well as all of deprecated functions rerouting to new methods.

We don’t have nearly the external development of EDD but we were able to maintian a great backward compatibility layer even in JS. Over 25k migrated so far and no breakages due to migration. ;).

Huge +1 for outlining this for others. May be good to include function and hook/filter remapping texhniques here as well.
- Pippin
  May 17, 2016
  
  Congrats on the successful migration!
butlerblog
May 17, 2016

Great post, Pippin! I’ve definitely been there – and even with experience under my belt find that I am guilty of similar sins (albeit hopefully less frequently). There have even been some times that I’ve looked at some code and thought, “who wrote this crap?” only to figure out later that is was my own from years past.

As usual, you’ve laid out some great concepts here. It is especially timely as I am working on db transaction table and I’ve been going through your tutorial as an example – it has been very helpful!
Cyrus
May 17, 2016

Thanks for the good read.

“The decision to use wp_posts and wp_postmeta for our eCommerce data is a decision I regret and one that has created significant challenges for us, ”

And yet I have read elsewhere to always do just that..Thank you for validating what I thought when i read that. Key value scheme is not the right method for every situation. I was not crazy.
Karissa Skirmont
May 17, 2016

Wow what a challenge Pippin!
While not exactly the same, I have looked back on my past work in graphics or just simple website I’ve built in general and wondered “What in the heck was I thinking‽‽‽”

I’ll be aware and watch for the updates as they roll out for my clients sites.

I have full confidence that you guys will sort it out in the best way for all. 🙂
hivladimir
May 17, 2016

We are moving Live Composer page builder from php to JavaScript and your article is right in time. Thank you. We have only ~20k plugin users but migration to the new plugin generation still makes me feel bad. I can’t even imagine what you will go through, but it surely better to do now then latter. It will be great to see how you’ll solve the problem. Looking forward to read more on this.
Puneet Sahalot
May 27, 2016

That’s a big and bold move, Pippin!
Much needed. I have been a long time user and developer for both EDD and WooCommerce.
I believe, Woo needs a new database approach as well. There were debates going on this https://github.com/woothemes/woocommerce/issues/9735 but not sure if those will come to effect or not.
Thanks for outlining your experience. 🙂
- Luke Cavanagh
  January 17, 2017
  
  WooCommerce did shift since CRUD classes went into WC 2.7 core, so it does support customers, orders and products.
Sandesh
June 4, 2016

Yeah, on personal projects many of us take shortcuts. Big plugins like EDD should have custom database tables for easy management and better DB Schema.
Murli
July 15, 2016

These is nice information, I really appreciate the content, which is cover over here, Fantastic.

Thank You
School ERP
May 8, 2018

Really Nice post, waiting your new blog…