Hadoop is an open-source framework for processing large amount of data across clusters of computers with the use of high-level data processing languages.
It’s modules provides easy to use languages, graphical interfaces and administration tools for handling petabytes of data on thousands of computers.
Hadoop is fault-tolerant and its settings allows to customize data redundancy levels.
Hadoop project includes 4 bundles modules:
- Hadoop common: supporting the modules
- Hadoop Distributed File System (HDFS): a distributed, high-performance, redundant file system
- Hadoop Yarn: job scheduler and cluster resource manager
- MapReduce: system for parallel data procession
Download virtual machines and Hadoop with pre-installed GUI and high-level query languages for experimental purposes.