Cassandra High Performance Cookbook Table of Contents

Cassandra High Performance Cookbook

Preface
Chapter 1: Getting Started
Chapter 2: The Command-line Interface
Chapter 3: Application Programmer Interface
Chapter 4: Performance Tuning
Chapter 5: Consistency, Availability, and Partition Tolerance with Cassandra
Chapter 6: Schema Design
Chapter 7: Administration
Chapter 8: Multiple Datacenter Deployments
Chapter 9: Coding and Internals
Chapter 10: Libraries and Applications
Chapter 11: Hadoop and Cassandra
Chapter 12: Collecting and Analyzing Performance Statistics
Chapter 13: Monitoring Cassandra Servers
Index

Preface

Chapter 1: Getting Started

Introduction
A simple single node Cassandra installation
Reading and writing test data using the command-line interface
Running multiple instances on a single machine
Scripting a multiple instance installation
Setting up a build and test environment for tasks in this book
Running in the foreground with full debugging
Calculating ideal Initial Tokens for use with Random Partitioner
Choosing Initial Tokens for use with Partitioners that preserve ordering
Insight into Cassandra with JConsole
Connecting with JConsole over a SOCKS proxy
Connecting to Cassandra with Java and Thrift

Chapter 2: The Command-line Interface

Connecting to Cassandra with the CLI
Creating a keyspace from the CLI
Creating a column family with the CLI
Describing a keyspace
Writing data with the CLI
Reading data with the CLI
Deleting rows and columns from the CLI
Listing and paginating all rows in a column family
Dropping a keyspace or a column family
CLI operations with super columns
Using the assume keyword to decode column names or column values
Supplying time to live information when inserting columns
Using built-in CLI functions
Using column metadata and comparators for type enforcement
Changing the consistency level of the CLI
Getting help from the CLI
Loading CLI statements from a file

Chapter 3: Application Programmer Interface

Introduction
Connecting to a Cassandra server
Creating a keyspace and column family from the client
Using MultiGet to limit round trips and overhead
Writing unit tests with an embedded Cassandra server
Cleaning up data directories before unit tests
Generating Thrift bindings for other languages (C++, PHP, and others)
Using the Cassandra Storage Proxy "Fat Client"
Using range scans to find and remove old data
Iterating all the columns of a large key
Slicing columns in reverse
Batch mutations to improve insert performance and code robustness
Using TTL to create columns with self-deletion times
Working with secondary indexes

Chapter 4: Performance Tuning

Introduction
Choosing an operating system and distribution
Choosing a Java Virtual Machine
Using a dedicated Commit Log disk
Choosing a high performing RAID level
File system optimization for hard disk performance
Boosting read performance with the Key Cache
Boosting read performance with the Row Cache
Disabling Swap Memory for predictable performance
Stopping Cassandra from using swap without disabling it system-wide
Enabling Memory Mapped Disk modes
Tuning Memtables for write-heavy workloads
Saving memory on 64 bit architectures with compressed pointers
Tuning concurrent readers and writers for throughput
Setting compaction thresholds
Garbage collection tuning to avoid JVM pauses
Raising the open file limit to deal with many clients
Increasing performance by scaling up

Chapter 5: Consistency, Availability, and Partition Tolerance with Cassandra

Introduction
Working with the formula for strong consistency
Supplying the timestamp value with write requests
Disabling the hinted handoff mechanism
Adjusting read repair chance for less intensive data reads
Confirming schema agreement across the cluster
Adjusting replication factor to work with quorum
Using write consistency ONE, read consistency ONE for low latency operations
Using write consistency QUORUM, read consistency QUORUM for strong consistency
Mixing levels write consistency QUORUM, read consistency ONE
Choosing consistency over availability consistency ALL
Choosing availability over consistency with write consistency ANY
Demonstrating how consistency is not a lock or a transaction

Chapter 6: Schema Design

Introduction
Saving disk space by using small column names
Serializing data into large columns for smaller index sizes
Storing time series data effectively
Using Super Columns for nested maps
Using a lower Replication Factor for disk space saving and performance enhancements
Hybrid Random Partitioner using Order Preserving Partitioner
Storing large objects
Using Cassandra for distributed caching
Storing large or infrequently accessed data in a separate column family
Storing and searching edge graph data in Cassandra
Developing secondary data orderings or indexes

Chapter 7: Administration

Defining seed nodes for Gossip Communication
Nodetool Move: Moving a node to a specific ring location
Nodetool Remove: Removing a downed node
Nodetool Decommission: Removing a live node
Joining nodes quickly with auto_bootstrap set to false
Generating SSH keys for password-less interaction
Copying the data directory to new hardware
A node join using external data copy methods
Nodetool Repair: When to use anti-entropy repair
Nodetool Drain: Stable files on upgrade
Lowering gc_grace for faster tombstone cleanup
Scheduling Major Compaction
Using nodetool snapshot for backups
Clearing snapshots with nodetool clearsnapshot
Restoring from a snapshot
Exporting data to JSON with sstable2json
Nodetool cleanup: Removing excess data
Nodetool Compact: Defragment data and remove deleted data from disk

Chapter 8: Multiple Datacenter Deployments

Changing debugging to determine where read operations are being routed
Using IPTables to simulate complex network scenarios in a local environment
Choosing IP addresses to work with RackInferringSnitch
Scripting a multiple datacenter installation
Determining natural endpoints, datacenter, and rack for a given key
Manually specifying Rack and Datacenter configuration with a property file snitch
Troubleshooting dynamic snitch using JConsole
Quorum operations in multi-datacenter environments
Using traceroute to troubleshoot latency between network devices
Ensuring bandwidth between switches in multiple rack environments
Increasing rpc_timeout for dealing with latency across datacenters
Changing consistency level from the CLI to test various consistency levels with multiple datacenter deployments
Using the consistency levels TWO and THREE
Calculating Ideal Initial Tokens for use with Network Topology Strategy and Random Partitioner

Chapter 9: Coding and Internals

Introduction
Installing common development tools
Building Cassandra from source
Creating your own type by sub classing abstract type
Using the validation to check data on insertion
Communicating with the Cassandra developers and users through IRC and e-mail
Generating a diff using subversion's diff feature
Applying a diff using the patch command
Using strings and od to quickly search through data files
Customizing the sstable2json export utility
Configure index interval ratio for lower memory usage
Increasing phi_convict_threshold for less reliable networks
Using the Cassandra maven plugin

Chapter 10: Libraries and Applications

Introduction
Building the contrib stress tool for benchmarking
Inserting and reading data with the stress tool
Running the Yahoo! Cloud Serving Benchmark
Hector, a high-level client for Cassandra
Doing batch mutations with Hector
Cassandra with Java Persistence Architecture (JPA)
Setting up Solandra for full text indexing with a Cassandra backend
Setting up Zookeeper to support Cages for transactional locking
Using Cages to implement an atomic read and set
Using Groovandra as a CLI alternative
Searchable log storage with Logsandra

Chapter 11: Hadoop and Cassandra

Introduction
A pseudo-distributed Hadoop setup
A Map-only program that reads from Cassandra using the ColumnFamilyInputFormat
A Map-only program that writes to Cassandra using the CassandraOutputFormat
Using MapReduce to do grouping and counting with Cassandra input and output
Setting up Hive with Cassandra Storage Handler support
Defining a Hive table over a Cassandra Column Family
Joining two Column Families with Hive
Grouping and counting column values with Hive
Co-locating Hadoop Task Trackers on Cassandra nodes
Setting up a "Shadow" data center for running only MapReduce jobs
Setting up DataStax Brisk the combined stack of Cassandra, Hadoop, and Hive

Chapter 12: Collecting and Analyzing Performance Statistics

Finding bottlenecks with nodetool tpstats
Using nodetool cfstats to retrieve column family statistics
Monitoring CPU utilization
Adding read/write graphs to find active column families
Using Memtable graphs to profile when and why they flush
Graphing SSTable count
Monitoring disk utilization and having a performance baseline
Monitoring compaction by graphing its activity
Using nodetool compaction stats to check the progress of compaction
Graphing column family statistics to track average/max row sizes
Using latency graphs to profile time to seek keys
Tracking the physical disk size of each column family over time
Using nodetool cfhistograms to see the distribution of query latencies
Tracking open networking connections

Chapter 13: Monitoring Cassandra Servers

Introduction
Forwarding Log4j logs to a central sever
Using top to understand overall performance
Using iostat to monitor current disk performance
Using sar to review performance over time
Using JMXTerm to access Cassandra JMX
Monitoring the garbage collection events
Using tpstats to find bottlenecks
Creating a Nagios Check Script for Cassandra
Keep an eye out for large rows with compaction limits
Reviewing network traffic with IPTraf
Keep on the lookout for dropped messages
Inspecting column families for dangerous conditions

Index

Book backreference:

Cassandra High Performance Cookbook

	RSS Feed
	Sign up to Packt's newsletter
	Follow Packt at Twitter
	Join our Facebook Group


	RSS Feed
	Sign up to Packt's newsletter
	Follow Packt at Twitter
	Follow Packt Enterprise at Twitter
	Join our Facebook Group
	Join our Packt Enterprise Facebook Group

[ Book Categories ]

[ Hot pre-orders ]

[ Packt Offers ]

[ Did You Know? ]

Cassandra High Performance Cookbook Table of Contents

Table of Contents

[ Popular Books from Packt ]

Subscribe now!

[ Your Shopping Cart ]

[ Super Saver Month ]

[ Oracle Offer ]

[ Survey ]

[ Packt bestsellers ]

[ Lite Books ]

[ Latest news ]

[ Subscribe to PacktLib ]

[ Footer Copyright ]

[ We accept the following ]

[ Packt updates ]

[ Links to Packt information ]