Cloud computing with Amazon Web Services
Time for action – checking the prerequisites
Time for action – downloading Hadoop
Time for action – setting up SSH
Time for action – using Hadoop to calculate Pi
Time for action – configuring the pseudo-distributed mode
Time for action – changing the base HDFS directory
Time for action – formatting the NameNode
Time for action – starting Hadoop
Time for action – WordCount, the Hello World of MapReduce
Time for action – WordCount on EMR using the management console
Comparison of local versus EMR Hadoop
The Hadoop Java API for MapReduce
Time for action – setting up the classpath
Time for action – implementing WordCount
Time for action – building a JAR file
Time for action – running WordCount on a local Hadoop cluster
Time for action – running WordCount on EMR
Time for action – WordCount the easy way
Walking through a run of WordCount
Time for action – WordCount with a combiner
Time for action – fixing WordCount to work with a combiner
Time for action – using the Writable wrapper classes
Using languages other than Java with Hadoop
Time for action – implementing WordCount using Streaming
Time for action – summarizing the UFO data
Time for action – summarizing the shape data
Time for action – correlating of sighting duration to UFO shape
Time for action – performing the shape/time analysis from the command line
Time for action – using ChainMapper for field validation/analysis
Time for action – using the Distributed Cache to improve location output
Counters, status, and other output
Time for action – creating counters, task states, and writing log output
Simple, advanced, and in-between
Time for action – reduce-side join using MultipleInputs
Time for action – representing the graph
Time for action – creating the source code
Time for action – the first run
Time for action – the second run
Time for action – the third run
Time for action – the fourth and last run
Using language-independent data structures
Time for action – getting and installing Avro
Time for action – defining the schema
Time for action – creating the source Avro data with Ruby
Time for action – consuming the Avro data with Java
Time for action – generating shape summaries in MapReduce
Time for action – examining the output data with Ruby
Time for action – examining the output data with Java
Time for action – killing a DataNode process
Time for action – the replication factor in action
Time for action – intentionally causing missing blocks
Time for action – killing a TaskTracker process
Time for action – killing the JobTracker
Time for action – killing the NameNode process
Time for action – causing task failure
Time for action – handling dirty data by using skip mode
Hadoop configuration properties
Time for action – browsing default properties
Time for action – examining the default rack configuration
Time for action – adding a rack awareness script
Time for action – demonstrating the default security
Time for action – adding an additional fsimage location
Time for action – swapping to a new NameNode host
Time for action – changing job priorities and killing a job
A Relational View on Data with Hive
Time for action – installing Hive
Time for action – creating a table for the UFO data
Time for action – inserting the UFO data
Time for action – validating the table
Time for action – redefining the table with the correct column separator
Time for action – creating a table from an existing file
Time for action – performing a join
Time for action – exporting query output
Time for action – making a partitioned UFO sighting table
Time for action – adding a new User Defined Function (UDF)
Time for action – running UFO analysis on EMR
Working with Relational Databases
Time for action – installing and setting up MySQL
Time for action – configuring MySQL to allow remote connections
Time for action – setting up the employee database
Time for action – downloading and configuring Sqoop
Time for action – exporting data from MySQL to HDFS
Time for action – exporting data from MySQL into Hive
Time for action – a more selective import
Time for action – using a type mapping
Time for action – importing data from a raw query
Time for action – importing data from Hadoop into MySQL
Time for action – importing Hive data into MySQL
Time for action – fixing the mapping and re-running the export
Time for action – getting web server data into Hadoop
Time for action – installing and configuring Flume
Time for action – capturing network traffic in a log file
Time for action – logging to the console
Time for action – capturing the output of a command to a flat file
Time for action – capturing a remote file in a local flat file
Time for action – writing network traffic onto HDFS
Time for action – adding timestamps
Time for action – multi level Flume networks
Time for action – writing to multiple sinks
What we did and didn't cover in this book