In the previous MySQL sharding challenges article, I talked about three challenges that you will face with do-it-yourself database sharding. These are; “sacrificing development agility”, “maintaining database operations and multiple IP addresses within the application”, and “breaking the MySQL ecosystem because the application conceals sharding strategies internally”.
In this article, I’d like to discuss three more challenges related to database distribution.
At the end of this article, you should have a better understanding of these three challenges and ways to work around them.
4. No Real-time Database Sharding Policy Validation – Solved!
One of the problems with do-it-yourself database sharding is that it can be hard to get real-time sharding policy validation. That is, how can you know if the policy is being followed uniformly across all parts of the application, and how can you know if the policy is efficient and optimized to avoid any bottlenecks or hotspots?
To get this type of information from DIY sharding, you would have to add code into your sharding to track all of this and then create dedicated reports for it. A good example of this would be detailed traffic reports that show things like; cross-references, master versus global versus shard table access, ONE_DB versus ALL_DB access data plans, reads versus writes, and specific database usage. But, do you really want that type of overhead in your application?
With ScaleBase, your MySQL shard policies are declared, managed and enforced from outside of your application. Your applications are unchanged. Instead, all MySQL shard policies are updated and enforced from, in ScaleBase, which provides real-time MySQL data traffic monitoring and reporting to identify and isolate hotspots and bottlenecks. When a hotspot or bottleneck is discovered to be forming, ScaleBase can help you fix the problem before things become a problem.
5. DIY Sharding Requires Schema Denormalization – Solved!
To use our school example from earlier blog entries, in the diagram below we can see the national school system and how it manages information for all 50 states. There would be a lot of information flowing through this hierarchy with 50 states, millions of students, and billions of grades. In most cases, the data would be accessed according to this hierarchy.
So – state_id would be a good candidate for sharding key, and I would want to achieve a state where all towns, districts, schools, all the way to grades, of the same state – are on the same database.
With DIY database sharding, you may be forced into schema denormalization:, putting state_id in all tables so every row in every table will know which database it needs to go to. After all, when I have a student record in my hand, how do I know which database I have to INSERT it to? When I want to query all grades in “Chester A. Arthur” elementary school in Manhattan, how do I know which database to go to? Database 1 or database 5? A state_id column will help me with those decisions.
But schema denormalization can greatly add to the complexity of your database and can potentially lead to performance problems. Not only is this a gargantuan task for existing gigabytes of data, but DBAs now have to make sure the denormalized databases do not become inconsistent in the future. This means DBAs must tell the system how redundant copies of the data (i.e. the STATE_ID in all tables below the state level) must be synchronized, in the case of frequent updates.
ScaleBase software eliminates the need to denormalize MySQL data. ScaleBase instead deploys a cascading shard key hierarchy. I wrote a fairly in-depth entry on cascading data distribution keys which you may want to explore.
6. DIY Sharding Can Result in Database Silos – Solved!
A successful and scalable shard policy will distribute data in a way that stores together data that is most often accessed together. This streamlines your heaviest and most frequent workloads as queries can complete within a single shard. But you may have an important minority of queries and reporting that needs to gather data from multiple shards. For these queries and reports, the data can appear to be in separate database silos. With homegrown DIY MySQL shards, dealing with this can be quite challenging.
ScaleBase Data Traffic Manager eliminates the data silos problem. When you have a query that needs data from several databases, ScaleBase will run the query in parallel on all databases, including cross-db GROUP BY, ORDER BY, aggregate functions and cross-db JOIN operations. ScaleBase aggregates results into one meaningful outcome, which is sent to the application just as if it came from a single MySQL database. ScaleBase also enables two-phase commit for transactions spanning multiple databases.
ScaleBase software simplifies sharding, so DBAs and developers can focus on other work. It provides real-time sharding policy validation and data traffic visibility to stay ahead of and resolve and distribution policies issues. It eliminates the challenge of database silos with cross-database operations, which are handled within your database architecture, removing complexity from applications.
Stay tuned for the next article in the series, where I will cover three more challenges that DIY sharding developers may face.