Hadoop Operations and Cluster Management Cookbook
Building a Hadoop-based Big Data platform
Choosing from Hadoop alternatives
Preparing for Hadoop Installation
Choosing hardware for cluster nodes
Configuring the cluster administrator machine
Creating the kickstart file and boot media
Installing the Linux operating system
Installing Java and other tools
Configuring Hadoop in pseudo-distributed mode
Configuring Hadoop in fully-distributed mode
Validating Hadoop installation
Managing the MapReduce cluster
Checking job history from the web UI
Configuring Hadoop daemon logging
Configuring Hadoop audit logging
Configuring service-level authentication
Configuring job authorization with ACL
Securing a Hadoop cluster with Kerberos
Configuring web UI authentication
Recovering from NameNode failure
Configuring NameNode high availability
Monitoring a Hadoop cluster with JMX
Monitoring a Hadoop cluster with Ganglia
Monitoring a Hadoop cluster with Nagios
Monitoring a Hadoop cluster with Ambari
Monitoring a Hadoop cluster with Chukwa
Tuning a Hadoop Cluster for Best Performance
Benchmarking and profiling a Hadoop cluster
Analyzing job history with Rumen
Benchmarking a Hadoop cluster with GridMix
Using Hadoop Vaidya to identify performance problems
Balancing data blocks for a Hadoop cluster
Using compression for input and output
Configuring speculative execution
Setting proper number of map and reduce slots for the TaskTracker
Tuning the JobTracker configuration
Tuning the TaskTracker configuration
Tuning shuffle, merge, and sort parameters
Configuring memory for a Hadoop cluster
Setting proper number of parallel copies
Configuring the reducer initialization time
Building a Hadoop Cluster with Amazon EC2 and S3
Registering with Amazon Web Services (AWS)
Managing AWS security credentials
Preparing a local machine for EC2 connection
Creating an Amazon Machine Image (AMI)