What is Commodity Hardware?"Hadoop runs on commodity hardware." This sentence is often heard in discussions about Hadoop, but what precisely does it mean?
Just as the definition of "big" in "Big Data" may be relative to the company or industry, so will the definition of "commodity" in "commodity hardware" be relative to a given point in time or a given industry, but several general points can be made.
Commodity hardware in general
- Commodity hardware has an average amount of computing resources; it is not considered a "sports car" in its field.
- "Commodity hardware" does not imply low quality, but rather, affordability.
- One common, though not necessary, feature of commodity hardware is that over time it is widely used in roles for which it was not specifically designed, as opposed to purpose-built hardware.
Commodity hardware in the context of Hadoop
- Hadoop clusters are run on servers.
- Most commodity servers used in production Hadoop clusters have an average (again, the definition of "average" may change over time) ratio of disk space to memory, as opposed to being specialized servers with massively high memory or CPU.
- The servers are not designed specifically as parts of a distributed storage and processing framework, but have been appropriated for this role in Hadoop.
Examples of Commodity Hardware in HadoopAn example of suggested hardware specifications for a production Hadoop cluster is:
- four 1TB hard disks in a JBOD (Just a Bunch Of Disks) configuration
- two quad core CPUs, running at least 2-2.5GHz
- 16-24GBs of RAM (24-32GBs if you're considering HBase)
- 1 Gigabit Ethernet
Or, for a more powerful cluster
- six 2TB hard disks, with RAID 1 across two of the disks
- two quad core CPUs
- 32-64GBs of ECC (Error Correcting Code) RAM
- 2-4 Gigabit Ethernet