What Is Sharding
Sharding is a database scalability technique that has proven itself in some of the world’s most popular MySQL web sites, applications and games. Simply stated, with a sharded database, instead of storing data in only one database instance, you distribute the data across multiple database instances, and potentially multiple servers. It’s a scale-out technique used by many SaaS companies, many of the most popular web sites social and mobile apps, and by many big multi-player games and enterprises for their customer-facing applications.
ScaleBase enables applications to cost-effectively leverage multiple MySQL instances working together
Historically, sharding a database required manually coding data distribution policies directly into your applications. Application developers would write code that stipulates directly where specific data should be placed and found. In essence developers were creating work-around code to solve a database scalability problem so their applications could handle more users, more transactions and more data.
Instead, what if you could leverage the expertise of MySQL sharding engineers who draw experience from hundreds of cases? Now you can, with ScaleBase.
Wikipedia defines sharding as:
Horizontal partitioning is a database design principle whereby rows of a database table are held separately, rather than splitting by columns (as for normalization). Each partition forms part of a shard, which may in turn be located on a separate database server or physical location.
Using an example can help explain MySQL sharding more clearly, so let’s take the following table:
This is a small table containing a list of customers. Any database can handle such a table. But happens if instead the table has to store seven million rows instead of just seven rows?
Theoretically, this should not be a problem. But usually there are lots of operations on such a large table – for example we may have many read and write operations on this table every second.
In practice, a very large customer table can become a database bottleneck. Why? Because it doesn’t al fit in the database server cache anymore, because of database isolation management, and for other reasons that cause the database to crawl under load.
How does sharding solve MySQL Scalability?
If we take the customers table, and split it into four different databases, each database will contain 1.75 million rows. That’s still a lot, but less than 7 million rows. This will result in improved database performance. In fact, ScaleBase tests have shown about 75% improvements in response time in some standard performance tests. You can see the results here. The following diagram shows how such a table can be split:
Every database will get some of the rows. In old-fashioned do-it-yourself sharding, it was the developer’s responsibility to create an efficient, application-specific data distribution policy that efficiently stipulated exactly where each row should be stored and found for each table. Nowadays, that work is simplified and automated using ScaleBase software.
Unlimited Scale – Sharding allows MySQL to cost-effectively scale out to an unlimited number of users, while increasing performance and data throughput. If you have hundreds of thousands (or hundreds of millions) of users, you may wonder when your MySQL database will failure.
“At scale, everything breaks” (Google VP Engineering, Urs Hölzle, says here).
Using ScaleBase software, a standard open-source MySQL database can scale out indefinitely.
Response time improvement – If your database is big (over 50GB) or has many writes or reads per second (anything above a few hundred per second), sharding will boost your database performance.
Do-It-Yourself Sharding is Complex
Writing do-it-yourself sharding code is difficult. It requires you to rewrite most of your Data Access Layer from scratch and your home-grown sharding code needs to address a range of concerns typically handled by a database. In the above example we’ve seen one table split across four databases, which seems simple. But with a real application, a real schema and real data, it can be more complicated.
For example, how will you maintain relations between the tables? Some tables are shard tables, some are global tables. Also, your sharding code needs to maintain global (broadcast, duplicate) tables across all DBs. For every INSERT the sharding code must know which database to update, and a mistake means bad data placement which translates into lost data. Additionally, for every SELECT the sharding code must know where to find the correct data. and now what about JOINs, GROUP BY, ORDER BY, UNIONs?…
And while it’s difficult to do when you write your own SQL code, it’s even more complex when using O/R mapping tools, as most are not “sharding oriented”.
And after writing the initial do-it-yourself sharding code, you can run into issues. For instance, a common problem occurs when scaling requires adding more shards. Usually, internally written sharding code supports a fixed number of shards, and adding shards requires massive code rewrites – as well as the major downtime required when moving data chunks from one shard to another.
Other parts of the infrastructure also change when using a manually sharded database. For example, the reporting application must now be aware of the sharding logic, since you want to collect data from multiple databases rather than just one. But reporting applications expect to interact with one IP address and one database. With a home-grown sharding deployment, you’ll have to write the reporting application from scratch that understands cross-shard joins and data aggregation to compile results from multiple databases back into a single result. ScaleBase software does this for you and allows you to continue using off-the-shelf third party reporting tools.
With home-grown sharding, database backups are also an issue. And the complexities continue to pile on. So, while, MySQL sharding is a great solution for database scaling, it is complex and costly if done manually, as many of the costs are hidden and only become realized after the initial sharding is performed.
ScaleBase Automates and Simplifies MySQL Sharding
ScaleBase gives you all the benefits of MySQL sharding, with none of the headaches and without changing a single line of your code. And since ScaleBase’s software not embedded inside your application, your BI, DBA team and backup tools can use it, too which means no ecosystem changes.
ScaleBase also provides the ability to scale MySQL with read/write splitting. This means copies of the database can now serve for read operations, while only one database is used for the writes. Database replicas are no longer useless machines only used for high availability – with ScaleBase they can be used for scaling as well, and your architecture will look something like this:
In this architecture you get database high availability and scale out via data distribution or read/write splitting – all in a fully redundant environment that also boosts application performance and database throughput. Several deployment options exist, so check out our Technical Whitepaper here to see which would be best for you.