We have witnessed a significant change in the handling and consumption of data over the past few decades, from the initial utilization of data stores to the creation of modern-day databases. The first data stores were designed to save data to a file so that it could later be retrieved, typically sequentially, from a single storage space around the mainframe, drastically limiting the amount of data able to be stored.
The approach to data and databases radically changed when client server technology broke through with the ability to distribute data and applications with an increasingly greater need for relational data manipulation. This became a common practice around use cases such as banks, large enterprises and various other information systems such as ERPZ that typically depict the need for ACID (Atomicity, Consistency, Isolation, Durability), the ability to always have a single point of truth for data. In time, these entities expressed the need to link different data stores, or data repositories, requiring access to a wide range of data associated with a single query subject. The challenge consequently arose of manipulating multiple pieces of data from a distributed database using the traditional relational model.
What has Changed
There have been three significant changes in our relationship to data and databases over the past 10 years. The first relates to the complexity of the data in terms of where it is located within the database. Rather than simply requiring a customer’s first name and last name for a query, maybe their address is needed or recent purchases including an itemized list of where each product came from and the manufacturer. Moving from one box to a distributed system, the different pieces of data can exist across multiple partitions or actual locations where “Acidity” has a significant role.
The next change that occurred was a result of the second internet (mobile) revolution, creating a slight differentiation in the types of data, storage, and usage. For example, new technologies have created the ability to record high-definition video and take high resolution pictures which can be very complex and require more effort and manipulation than what was anticipated in the past.
The final change in the data dilemma pertains to the tremendous amount of data associated with these modern day services and applications. The number of data pieces and sources has grown such a significant amount that traditional databases such as are not capable of manipulating so many pieces of information, let alone in real time.
Open Source Still Rules
Within this whole notion of modernization and the varied data types and sizes that have been touched upon above, there are supplementary aspects that have not been mentioned yet play just as significant a role in the future of databases. Open source have developed immensely over the past decade, giving businesses of all sizes a much more cost friendly option for data storage and manipulation than the through the roof costs of traditional database licenses. Enabling a pay-as-you-go modern approach has made traditional licensing contracts seem prehistoric.
Two databases that are currently being used in the open source method are MySQL, which was finally acquired by Oracle, yet continues to operate as an open source free license, and PostgreSQL, a completely public open source project that is perceived to be the only real open source option available in the market today. All things considered, the natural next question would be, “Where are platforms like MySQL, Oracle and PostgreSQL headed?” Converting them into NoSQL databases is not a valid option, since the ability to use full ACID compliancy would be lost, defeating the initial purpose of their creation, and having them process or deal with new data types does not fit into their repertoire.
As a result of these changes, these shortfalls have made room for new databases and data manipulation techniques and methodologies to form. Now that the foundation of data growth patterns has been laid out, the question that needs to be asked today when developing an application of any kind (be it mobile, web or enterprise) is, “What are its data needs?” This is taken into consideration the data type, demand and criticality of the online service when investigating the best tools and tactics for storage, management, and manipulation. Clearly identifying the categories of data that the application would be using or consuming along with the type of manipulation required distinguishes the need for a relational database, a non-relational, otherwise known as NoSQL, or something in between. These are the questions that need to be asked when developing an application from a data point of view.
If an application does not need to be supported by a full blown relational database, NoSQL is the easiest, most scalable, and most straightforward value. One of the main differences between NoSQL and MySQL is NoSQL’s schema-less nature. With MySQL, the application always needs to be aware of the data model of the schema. Since there is no schema in NoSQL databases, querying is impossible however this permits the database to (almost) automatically change as the application is developed.
There are three varieties of NoSQL databases: the widely accepted document based databases, such as Mongo, that allow for effortless storage and retrieval of document based objects; key value stores where the values put in are what you get out, with the scaling abilities; and the Hadoop family of products used primarily for data crunching and processing which is currently replacing the older business intelligence systems like Oracle, decreasing the processing time to provide answers in minutes, and even seconds.
If a modern web-scale application requires ACID compliancy, the most efficient way to work now-a-days is by utilizing both SQL and NoSQL in order to get the best performance out of your database.
Where does ScaleBase fit in?
As a developer, if you are approaching a very large data set and dealing with an escalating number of transactions, you can do one of three things:
- You can change from MySQL to Oracle, which will buy you a bit more time, since Oracle can process much more than MySQL, bearing in mind that Oracle will reach a cut-off point, as well.
- You can switch to a full blown NoSQL, losing acidity, the ability to manipulate the data relationally.
- You can do something in between, which would take MySQL, remove the barriers, and make it look as if it’s a NoSQL database in terms of scale, which is what ScaleBase does.
These are the three options available if you have already hit the threshold for database scalability or anticipate your application getting to scales that are not manageable by the more well-known database hardware and software of today.
The actual ScaleBase process involves, taking a MySQL database that either is hitting or is on the verge of hitting a wall in terms of scale capability, specifically by size, and dividing it up into multiple pieces of data, making the database a cluster. After inspecting the schema, a recommendation is made regarding the best way to distribute the data, which is then analyzed, optimized and distributed across as many nodes as necessary. With a virtually linear growth rate, the number of potential application processes is greatly increased. Being the sole provider of such a service, Scalebase has created a truly revolutionary achievement in the database market.