News

How to Implement MySQL Sharding – Part 2

September 26th, 2011

In the previous post of this series (which can be found here) I discussed how to identify tables that can serve as good candidates for sharding.

Once you have decided which tables should be sharded (all the rest should be global tables), the choice of sharding keys is rather straightforward, as most will use the table primary key as the shard key. Of course, if multiple tables are sharded, and there is a foreign key relationship between these tables, then the foreign key will serve as the shard key for some tables.

Many people attempt to shard based on customer_id or a resource id, but I have seen how this usually fails in production environments. It is very hard to know in advance which customers belong together in the same database, and since customers can suddenly increase their traffic, this might create an unbalanced situation in which some shards are very busy while others are relaxed (see the details of last year’s FourSquare outage for some possible results of unbalanced sharding).

As with database partitioning, there are multiple algorithms available for sharding: hash , list, or range. Usually you’ll use list and range for multi-tenancy – saving customer information across different databases and maybe even different data centers. I’ll touch on that subject in a future post. But hash will probably give you the best results when it comes to sharding, as statistically it ensures that data is evenly distributed across all shards.

So after sharding configuration, what’s next?

If you have a new application you can skip the next section and just wait for the next post. But if you have an existing database, you’re stuck with huge amounts of data that you need to split.

We at ScaleBase ran a lot of tests and found that the following is the best mechanism for the initial data migration (BTW – if you use ScaleBase,– we handle and also optimize the data migration for you).

  1. Have the database cloned in all shards. It can be done by cloning a VM, or copying the physical files, or using mysqldump to export once and import to all shards.
  2. For each shard (on shard tables only):
    1. Drop all indexes.
    2. Delete the irrelevant data from the shard (this should be done by an automatic script of course).
      Note: This action creates a lot of fragmentation. You might consider creating temporary a table, inserting to it only the relevant rows, drop the original table and rename the temporary one to the real name
    3. Create all indexes.

In the next post we’ll talk about the programming language modifications that sharding requires.

2 Responses to How to Implement MySQL Sharding – Part 2

  1. Pavel says:

    Hi, I wonder about the best mechanism for the initial data migration … could you give some problems one may encounter when one doesn’t clone the whole db and remove parts of it? Thanks!

  2. Doron Levari says:

    Hi Pavel thank you for your note.
    Well, it’s all a matter of performance. The end-result is the same.

    Dumping the entire database, and then inserting only relevant data to each shard, is a very simple process and it works great, however it might be slow when handling large amounts of data.

    Note that not all tables are shard tables. Some tables are “global tables” (as explained here: http://www.scalebase.com/resources/documentation/table-policies) that anyway should be duplicated as a whole among all shards.

    Thanks,
    Doron

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Categories

  • Archives

JS and CSS Optimization by PHP Speedy JS and CSS Optimization by PHP Speedy