Computers need operating systems (OSe) to function. Operating systems are the basic level of software that support the basic functions of a computer, make it work and primarily make it usable. Everyone knows the most famous operating systems for personal computers, such as Windows, MAC OS and Linux. One of the most basic functions of the operating system is the data system.
For example, everyone knows the Windows data system that Microsoft provides users with a folder structure in which they can store data in any form, for example as documents, music and pictures. Just like normal computers, computer clusters also need software that enables basic functions, e.g. coordination between different nodes of the cluster. One such software environment for operating a computer cluster is Apache Hadoop.
Software environments for operating computer clusters must provide a distributed data system. Just as with normal computers, users need a way to store their data in computer clusters. Implementing a data system on a single computer is simple compared to implementing it in a distributed system.
The reason for this is that if you want to store files and documents across multiple computers, they need to be split up and stored in parallel on multiple nodes - all seamlessly for the user. This is very hard to do (just think how difficult it is to remember all the things they have packed in little boxes when they move). Some examples of distributed data systems are the Google File System (GFS) and the Hadoop Distributed File System (HDFS).