The Sharding Challenge
Historically, database sharding was used by DBAs and developers to split data from a single, large instance to multiple, smaller instances, in order to scale out.
Sharding has always been looked at as a cumbersome and labor intensive task that involves ongoing changes in the application that need to be continuously addressed and optimized.
While sharding is not breaking news in the world of databases, what is noteworthy is Oracle’s clear recognition as the means to scale out. While solutions, like MySQL Cluster, do exist, it appears that Oracle felt the need to take the cluster variation of the highly priced, arbitrary data distribution sharding mechanism up a notch. Recognizing that splitting up data and sharding it into multiple partitions at random can cause some retrieval issues, the new found solution strives to create more control. Oracle’s MySQL Fabric reflects the modern approach of how more complex versions of MySQL will look in the future.
So, What is MySQL Fabric?
“MySQL Fabric is an integrated system for managing a collection of MySQL servers and is the framework on which high-availability and sharding is built. MySQL Fabric is open-source and is intended to be extensible, easy to use, and support procedure execution even in the presence of failure, an execution model we call resilient execution.” MySQL Musings
Essentially, MySQL Fabric aims to be an elevated, modern alternate version for MySQL Cluster, allowing users to have more control over actual database sharding policies, division, and handling of the data. MySQL Fabric’s approach aids in sharding a database according to whatever custom policy is developed, which could be random, range-based, hash-based, or any other type of somewhat controlled policy. Look at the Fabric as a directory service; the applications query the Fabric where a data-set resides and thereafter interacts with the appropriate partition directly.
However, even with this approach, there still is a portion of application code that needs to be continuously maintained and updated.
Where Does ScaleBase Come in?
ScaleBase’s solution takes a different approach, relying on automation as a means to tackle the dynamic ongoing sharding process, removing the need to update the application code. Database distribution is a work in progress, constantly finding places and ways to simplify, speed up, and optimize. ScaleBase’s method aims to avoid the randomness of a MySQL Cluster configuration by uncovering the best way to break up various tables of data and distribute them accordingly across multiple shards.
What’s the Difference?
There are inherent differences between MySQL Fabric and ScaleBase’s approach to sharding. While MySQL Fabric distributes data based on whatever random policy happens to be in place, ScaleBase runs analysis to produce the best distribution policies specifically matched to an application’s needs, optimizing distribution and configuration to match application requirements.
In terms of data retrieval, MySQL Fabric provides the location of the data, however, every time a change is made, updates must be made to the application code. Due to a clear understanding of customized distribution policies and data location, ScaleBase has the ability to retrieve the right data regardless of how many shards it may be split across. Further, ScaleBase’s application-aware smart distribution method can identify if data is located across multiple shards, and perform a necessary cross-shard join and aggregations, which is generally a very complex task when done manually.
ScaleBase removes all hurdles in the sharding process, including migrations, automation and ongoing optimization. While MySQL Fabric Cluster, does acknowledge the need for sharding, yet only solves a small percentage of sharding tasks.
The Supermarket Example
To simplify these differences, let’s look at an example. Imagine you are in a large supermarket for the first time in your life. You have no idea where any items are located, so you ask the cashier at the entrance where the bread is. The cashier replies, “Aisle 4, at the far end of the store.” So, you go to aisle 4, you pick up your bread and you come back to the cashier to pay. This is analogous to the MySQL Fabric approach to sharding. You are told where something is located, then you are expected to retrieve it, yourself.
Using the same analogy, ScaleBase’s approach goes something like this. You are at the supermarket for the first time, however, when you ask where the bread is, one of the store employees goes to find it and brings it back to you at the front of the store.
ScaleBase Takes The Prize
The interesting thing about sharding is that it assumes that the procedure being executed (i.e. requesting a specific piece of data) can be applied to any database. In a world driven by data management in modern databases, the same concept should apply to any database. Whether it be PostgreSQL sharded, MongoDB sharded, or even an API requesting metadata from a PostgreSQL database and object data from MongoDB with the same API code.
A proxy approach similar to that of ScaleBase’s can navigate multiple sources of data to support MySQL, and can essentially be appropriated to any type of data: SQL, NoSQL, documents, and everything in between. This places ScaleBase in a very strategic position behind the scenes, observing and controlling all of the traffic that travels between the application and database. This advantage enables the ability to apply security policies, user access policies, and any other application oriented functions and features that will help optimize and better monitor the data that goes between the type of data (i.e. database, data stream, data source, etc…) and the type of application using said data (i.e. mobile, web, mainframe, etc…). ScaleBase’s approach makes a good attempt at normalizing all of the various layers involved in today’s battle with data distribution and retrieval.
ScaleBase acknowledges the advantages and disadvantages of MySQL’s latest database distribution service. It is refreshing to see a clear acknowledgement of the real need for efficient database sharding considering it is the only truly viable long-term option for scaling out databases. MySQL Fabric is essentially MySQL Cluster’s method of scaling out using unified storage, which is a great time saver, however, significant limitations still exist. It is a good step forward, yet not quite a quantum leap. In that respect, ScaleBase takes the prize, offering an intelligent solution everyone can enjoy.
We welcome everyone to have a look at our solutions and even give them a try if effective scaling out of a relational database is important to you and your company.