What is a distributed system and why is it so complex?
With the ever-increasing technological expansion of the world, distributed systems are becoming more and more widespread. They are a vast and complex field of study in computer science.
The purpose of this article is to introduce you to distributed systems in a basic way, giving you a glimpse of the different categories of such systems while diving deeply into the details.
What is a distributed system?
In its simplest definition, a distributed system is a group of computers working together that appear as a single computer to the end user.
These machines have a shared state, operate concurrently and can fail independently without affecting the uptime of the entire system.
Let’s go with a database! Traditional databases are stored on a single machine’s filesystem, whenever you want to fetch/insert information into it – you talk directly to that machine.
For us to distribute this database system, we have to run this database on multiple machines at the same time. The user should be able to talk to whatever machine he chooses and should not be able to tell that he is not talking to a machine – if he inserts a record in node #1, node #3 will not be able to tell that record Should be able to return.
Why distribute a system?
Systems are always delivered by need. The truth of the matter is – managing distributed systems is a complex subject fraught with pitfalls and landmines. Deploying, maintaining and debugging distributed systems is a headache, so why go there?
What a distributed system enables you to do is scale horizontally. Going back to our previous example of a single database server, the only way to handle more traffic would be to upgrade the hardware the database is running on. This is called vertical scaling.
Vertical scaling is all well and good as long as you can, but after a certain point you’ll notice that even the best hardware isn’t enough for enough traffic, not impractical to host.
Scaling horizontally means adding more computers instead of just upgrading one’s hardware.
It is much cheaper than scaling vertically after a certain threshold but it is not its main matter for preference.
Vertical scaling can only increase your performance to the capabilities of the latest hardware. These capabilities prove insufficient for tech companies with medium to large workloads.
The great thing about horizontal scaling is that you have no limit on how much you can scale – whenever performance degrades you just add another machine, potentially to infinity.
Easy scaling isn’t the only benefit you get from distributed systems. Equally important are fault tolerance and low latency.
Fault Tolerance – A group of ten machines in two data centers is inherently more fault-tolerant than a single machine. Even if a data center fire breaks out, your application will still work.
Low latency – the time it takes for a network packet to travel the world is physically limited by the speed of light. For example, a fiber-optic cable from New York to Sydney has the shortest possible time to round-trip time (that is, go back and forth) of a request. Distributed systems allow you to have a node in both cities, allowing traffic to hit the node that is closest to it.
For a distributed system to work, however, you need software that runs on machines that are specifically designed to run on multiple computers at the same time and handle the problems that come with it. This proves to be no easy feat.
scaling our database
Imagine that our web application has become extremely popular. Also imagine that our database started getting twice as many queries per second as it can handle. Your application will immediately start degrading in performance and this will be noticed by your users.
Let’s work together and scale our database to meet your high demands.
In a typical web application you normally read information more often than you would by inserting new information or modifying old one.
There is one way to increase the read performance and that is the so called primary-replication replication strategy. Here, you create two new database servers that sync with the main server. The catch is that you can only read from these new examples.
Whenever you insert or modify information — you are talking to the primary database. In turn, it asynchronously notifies replicas of the change and they save it as well.
Congratulations, you can now execute 3x more read queries! Isn’t it great?
Gotcha! We quickly lost C in the ACID guarantee of our relational database, which stands for consistency.
You see, there now exists a possibility in which we insert a new record into the database, immediately thereafter issue a read query for it and get nothing back, as if it didn’t exist!
Propagating new information from the primary to the replica does not happen immediately.