When it comes to handling large amounts of data, there is really only one way to reliably do it: batch processing. The concept of batch processing is simple. Instead of performing one large query and then parsing / formatting the data as a single process, you do it in batches, one small piece at a time. If you have ever attempted to query or export a large amount of data and had your server timeout, you’ll easily understand just how beneficial batch processing is. While it is significantly more difficult to build a batch processor than a single query process, it is not all that difficult once you know how it works.

There are two kinds of batch processing I want to look at. One that works with a redirect that causes the browser to reload after each batch and one that works with Ajax for a more polished and smooth user experience.

Let’s look at the simpler-to-build redirect based processing first.

For this simple example, we will look at my Batch Comment Spam Deletion plugin. It uses batch processing to delete large numbers of spam comments without causing server timeouts or crashes, as the native Empty Spam tool does in WordPress core.

The plugin is pretty simple. There are two primary components to the plugin:

  1. A step processor. This is simply a function that detects if there are spam comments to delete and then deletes a small number of them. Each set is considered a batch.
  2. A step trigger. This is a small piece of Javascript that handles the traverse from step to step. Each time a new step is started, the function above is called.

In my Batch Comment Spam Deletion plugin, the batch processing looks like this:

It does one hundred comments per batch and after each batch is processed, the page redirects so that the next batch can be completed. This is repeated until all spam is deleted.

The code for this is pretty simple as well.

First we start with a button that takes us to a specific page:

The button links to a simple page that has a small piece of Javascript embedded on it:

The Javascript is key. It triggers a redirect each time the page is loaded and instructs WordPress which processing step we are on. When the redirect happens, this function is fired:

Let’s break it down in order:

  1. The page at wp-admin/edit-comments.php?page=pw-bcpd-process is loaded
  2. The Javascript redirect fires and sends the browser to wp-admin/edit-comments.php?action=pw_bcsd_process&step=1&total=&_wpnonce={nonce}
  3. The process_batch() function (which is tied to admin_init) fires and processes the first batch. Once the batch is processed, it redirects the user back to wp-admin/edit-comments.php?page=pw-bcpd-process&step=2&total={total}
  4. The Javascript redirect fires again and sends the user to wp-admin/edit-comments.php?action=pw_bcsd_process&step=2&total={total}&_wpnonce={nonce}
  5. The process_batch() function fires and processes batch number 2. Once completed, the user is sent back to wp-admin/edit-comments.php?page=pw-bcpd-process and the process repeats again and again until all spam has been deleted.
  6. As soon as process_batch() fails to detect any more spam comments, the user is redirected to wp-admin/edit-comments.php?message=batch-complete and the process is completed.

Using this redirect method works very well, especially considering it’s rather trivial to build. This method can handle massive amounts of data on most servers but will occasionally fail on specific server configurations.

A better option that redirects, however, is to use Ajax to handle all of the processing so that there are no page reloads.

My team and I just released the first beta version of Easy Digital Downloads 2.4. In this release, we’ve added batch processing to our CSV export options. This has allowed us to improve the data that is exported and also provides us with greater reliability when exporting large amounts of data.

When processing, our export options look like this:

Screen Shot 2015-06-19 at 12.27.57 PM

It shows an accurate progress bar and a spinner icon to indicate that WordPress is processing the export. As soon as the progress bar reaches the right side of the box, the export is finished and the file download dialog is presented to the user.

There are several distinct advantages to using Ajax processing instead of a redirect:

  1. Even with the Javascript redirect methods, some sites still struggle with redirect loops
  2. Reloading the page on each redirect is slower
  3. Ajax processing provides fast, accurate, visual feedback to the user

Overall, both batch processing methods work about the same. They both iterate through the data one step at a time. The only real difference is how the progression from step to step is handled.

Rather than triggering the export process by loading a page, the Ajax version can be triggered by clicking a button that is intercepted by jQuery. In Easy Digital Downloads, that looks like this:

This reveals the progress bar and then fires the function process_step(), which is then responsible for sending a request to the server that is picked up by PHP. The process_step() function:

The Ajax request sent by this script is processed by a function tied to wp_ajax_edd_do_ajax_export. If you’re not familiar with how Ajax works in WordPress, check out my posts on the subject. Our PHP processing function looks like this:

There are parts of this function that are irrelevant to this post since they are very specific to Easy Digital Downloads, so we will look past those and just analyze the general concepts.

First a few parameters are setup to control the date range, the status, etc, for the data to be exported. These get passed to our database query in a moment.

Second, we fire two functions:

$ret = $export->process_step( $step );
$percentage = $export->get_percentage_complete();

The process_step() function is nearly identical to the one we used above in Batch Spam Comment Deletion so there’s no need to look at its definition. It is just a simple WP_Query. What is important here is how the response from process_step() is handled.

If it returns a truthful value, it means we have additional data to export, so we need to send back a response to our Javascript that indicates it is time to process the next step. We also send back the next step number.

If a non-truthful value is returned, we send a “done” response.

Looking back at our Javascript, we see this check:

What’s important to notice here is how it’s actually a recursive function. If the export is not complete, we call the process_step() function from within itself:

self.process_step( parseInt( response.step ), data, self );

This starts the next step and will continue to repeat the process until response.step is returned as “done”.

By building a batch processor for our export system, Easy Digital Downloads is now able to easily export massive amounts of data, even on cheap servers. We have personally tested it with over 15,000 records and it worked flawlessly.

Anytime you need to potentially handle massive amounts of data, you should consider using batch processing to ensure better reliability.

For Easy Digital Downloads, we also use batch processing to handle database upgrade routines. You can see an example of that with an upgrade we had to run on the customer database for EDD 2.3.

Have you written a batch processing system? Does it look similar? If you have any questions, suggestions, or comments, do not hesitate to leave them below.

  1. Jason

    Love reading posts about solving potentially complex problems! Keep em coming.

  2. Eric Daams

    Thank you Pippin! This is tremendously helpful.

    I recently wrote a batch processor that basically works the way your first example does. The AJAX route looks like a far more stable way to move forward though, particularly since it makes for a much cleaner UI for the user.

    One question. What is the value of this?

    $_REQUEST = $form = (array) $form;

    Further down both $_REQUEST and $form are used. Could just get away with either having $form or $_REQUEST?

    Cheers,
    Eric

  3. Yasar

    Hi,

    Great work. Saved my life…. I was searching since 2 weeks for a article like this. I am very glad and thankful to you…

  4. Jeremy Green

    Thanks for writing this post. It came in very handy on a recent client project where I needed to batch process emails via ajax.

  5. ron

    Thank you Pippin for this helpful post. Keep up the good work!

  6. Yudhistira Mauris

    Thank you Pippin for the post. It really helps me to understand how batch processing works. I just realize how important this is after working with 100,000 rows of DB data.

Leave a Reply