Many challenges arise when preparing to scale out a relational database. It is essential, however, to first take into consideration whether or not the application contains the necessary elements that support and facilitate a scale-out approach, and the data model is the first place to look. For example, if database sharding is used as a scalability technique, the jumping-off point would be to identify whether or not the application can indeed be sharded and, therefore benefit from a scaled-out database. In principle every application has the potential to be scalable, it all depends on the data model.
Let me explain.
What is Scalability?
Every company starts small, and then grows according to the success of its product or service. As a result of users concurrently connecting to the database and performing an assortment of actions, the amount of data and database transactions per second increase rapidly. Success is good! Naturally, your next step is to ensure your database can scale to meet the growing workload requirements of your application. Scaling-out your database is an excellent approach to counterbalance the influx of workload and throughput to handle the growing number of questions submitted and answers received from the database.
The key idea behind scaling out a database is to facilitate an environment where both the workload and throughput can grow independently and simultaneously while the application maintains a high level of performance and availability. I have written a series of in-depth articles specifically about optimizing data distribution that you can dive into for more detail.
A database can comfortably handle a certain number of transactions per second. However, when that number is multiplied exponentially, the proper system needs to be in place to cope with the increased demand. One option is balancing the transactions across a distributed database, deployed across many smaller servers. A distributed database can more efficiently handle workloads at a fraction of the cost of a using a single, much larger database server.
While using multiple databases is a key factor in scaling out, how the data is distributed has an even bigger effect on successful database scalability. This is a very important issue that is not discussed enough. Perhaps that’s because only ScaleBase has capabilities to craft a unique data distribution policy that is tuned to match your unique application and data model.
And critically, if you’re scaling out a MySQL database, you want to maintain ACID compliance and leverage SQL properties.
Let’s look deeper
If queries require data from several different databases in order be fulfilled, it defeats the purpose of distributing data in the first place. The data distribution is meant to decrease the number of queries that any particular database receives. So, if a query needs to pull data from 10 different databases and there are 10,000 other queries that do the same, each database is being used at least 10,000 times, which may or may not be within its limits, and certainly might slow down the response time. The same situation applies for update transactions. Since updates need to be made to each database, this locks the databases and eliminates the scalability benefit.
There is a light at the end of the tunnel, however. If the data is distributed in such a way that each query and transaction are satisfied by a single database, then the data model used by an application is considered to be successfully scaled and in the most efficient manner.
Database Distribution is a Challenge
Data needs to be distributed in a specific manner due to the fact that queries are not easily distributed. Queries require data to be fulfilled, and therefore will go wherever the necessary data is located. Therefore, data distribution should be intelligently done to promote efficient query distribution and fulfillment.
So, how can you identify the best data distribution strategy for your application, data model, schema, and database?
You need to examine the nature of your data, data relationships and the functional use of your data. Queries need to be examined in depth. You need to identify query patterns. Which queries are most commonly executed? Which data is primarily accessed together? You will need to look at the main flows of your application.
It is not enough to simply distribute data according to the usual suspects for classification (i.e. User ID, Profile ID, Store ID, Account ID). Identifying the data hierarchy and storing related tables together is necessary as well for query fulfillment (i.e. storing a user’s information along with related documents and comments).
A properly scaled out database will aim to have more than 90% of queries fulfilled from the data found in a single database. The remaining 10% will require some aggregation of data between databases, which is fine.
For instance, let’s consider an example: querying comments on a document that belongs to specific users. If a query requests information about the individual users who wrote the comments, a different database (which contains that information) may have to be brought into the query. These types of queries will impact the shards involved for a certain period of time, however, the impact is minimal so long as these types of queries only make up 10% of the overall load.
Doing the analysis that is required to pin-point the best data distribution policy for your unique database and application workload can be complicated and slow work. It can involve iterative, trial and error analysis as you progressively refine the policy to better distribute data so you more evenly distribute workloads. Unfortunately, if you do all this analysis by yourself, until you deploy your distributed database, you won’t know the true impact.
Luckily, there is a faster and simpler way.
Get Your Scalability Score – ScaleBase Analysis Genie
The Analysis Genie, ScaleBase’s newest SaaS offering, guides you through the entire process to isolate the best data distribution strategy for your database and application workload. First, it helps you identify and capture the data required for analysis. Then it helps you understand and interpret the data to so you can recognize the application’s requirements as well as grasp the database schema structure and table sizes.
The Analysis Genie then digests all of the information, and gives you a way to visualize the entity hierarchy structure. The Genie then generates brilliant suggestions on how to group related data together in one database. Simulated queries are executed on trial databases. This uncovers the percentage of the workload that consumes one database at a time, and the percentage that consumes more than one database, but that does not compromise the efficiency of the application as a result. Ultimately, the Analysis Genie gives you a score, reflecting how well the data can be distributed; we call is the Scalability Score.
Experience Counts – And We’ve Packaged It for You
ScaleBase has handled a large number of applications and relational data models that have been deemed un-scalable, when in reality the data was simply distributed incorrectly. Every application’s potential scalability is revealed by its Scalability Score, which indicates the optimal data distribution method. What is more, ScaleBase Analysis Genie provides Scalability Scores and data distribution impact analysis that removes the guessing game before any actual changes or distribution of your database.
In the end, we can all agree that not all applications are scalable. However, for the majority that are, the effort lies in identifying the right methods and policies needed for each to be scaled efficiently.
Arbitrarily throwing data into different databases is simply not scale-friendly and only increases the amount of limitations. ScaleBase’s Analysis Genie allows you to learn about your application’s data and identifies the appropriate data distribution policy that will allow your environment to efficiently scale out. What you do from that point forward is up to you. Whether you want to hardcode into your application the data distribution policy for your database, or export the policy to ScaleBase so you can easily control and update the policy to stay ahead of evolving requirements of applications and users.
Would you like to try it? The ScaleBase Analysis Genie is in Open Beta. You can get started here: http://www.scalebase.com/analysis-genie/