Skip to Content

Cassandra High Performance Cookbook Table of Contents


Table of Contents

Preface
Chapter 1: Getting Started
Chapter 2: The Command-line Interface
Chapter 3: Application Programmer Interface
Chapter 4: Performance Tuning
Chapter 5: Consistency, Availability, and Partition Tolerance with Cassandra
Chapter 6: Schema Design
Chapter 7: Administration
Chapter 8: Multiple Datacenter Deployments
Chapter 9: Coding and Internals
Chapter 10: Libraries and Applications
Chapter 11: Hadoop and Cassandra
Chapter 12: Collecting and Analyzing Performance Statistics
Chapter 13: Monitoring Cassandra Servers
Index

  • Chapter 1: Getting Started
    • Introduction
    • A simple single node Cassandra installation
    • Reading and writing test data using the command-line interface
    • Running multiple instances on a single machine
    • Scripting a multiple instance installation
    • Setting up a build and test environment for tasks in this book
    • Running in the foreground with full debugging
    • Calculating ideal Initial Tokens for use with Random Partitioner
    • Choosing Initial Tokens for use with Partitioners that preserve ordering
    • Insight into Cassandra with JConsole
    • Connecting with JConsole over a SOCKS proxy
    • Connecting to Cassandra with Java and Thrift
    • Chapter 2: The Command-line Interface
      • Connecting to Cassandra with the CLI
      • Creating a keyspace from the CLI
      • Creating a column family with the CLI
      • Describing a keyspace
      • Writing data with the CLI
      • Reading data with the CLI
      • Deleting rows and columns from the CLI
      • Listing and paginating all rows in a column family
      • Dropping a keyspace or a column family
      • CLI operations with super columns
      • Using the assume keyword to decode column names or column values
      • Supplying time to live information when inserting columns
      • Using built-in CLI functions
      • Using column metadata and comparators for type enforcement
      • Changing the consistency level of the CLI
      • Getting help from the CLI
      • Loading CLI statements from a file
      • Chapter 3: Application Programmer Interface
        • Introduction
        • Connecting to a Cassandra server
        • Creating a keyspace and column family from the client
        • Using MultiGet to limit round trips and overhead
        • Writing unit tests with an embedded Cassandra server
        • Cleaning up data directories before unit tests
        • Generating Thrift bindings for other languages (C++, PHP, and others)
        • Using the Cassandra Storage Proxy "Fat Client"
        • Using range scans to find and remove old data
        • Iterating all the columns of a large key
        • Slicing columns in reverse
        • Batch mutations to improve insert performance and code robustness
        • Using TTL to create columns with self-deletion times
        • Working with secondary indexes
        • Chapter 4: Performance Tuning
          • Introduction
          • Choosing an operating system and distribution
          • Choosing a Java Virtual Machine
          • Using a dedicated Commit Log disk
          • Choosing a high performing RAID level
          • File system optimization for hard disk performance
          • Boosting read performance with the Key Cache
          • Boosting read performance with the Row Cache
          • Disabling Swap Memory for predictable performance
          • Stopping Cassandra from using swap without disabling it system-wide
          • Enabling Memory Mapped Disk modes
          • Tuning Memtables for write-heavy workloads
          • Saving memory on 64 bit architectures with compressed pointers
          • Tuning concurrent readers and writers for throughput
          • Setting compaction thresholds
          • Garbage collection tuning to avoid JVM pauses
          • Raising the open file limit to deal with many clients
          • Increasing performance by scaling up
          • Chapter 5: Consistency, Availability, and Partition Tolerance with Cassandra
            • Introduction
            • Working with the formula for strong consistency
            • Supplying the timestamp value with write requests
            • Disabling the hinted handoff mechanism
            • Adjusting read repair chance for less intensive data reads
            • Confirming schema agreement across the cluster
            • Adjusting replication factor to work with quorum
            • Using write consistency ONE, read consistency ONE for low latency operations
            • Using write consistency QUORUM, read consistency QUORUM for strong consistency
            • Mixing levels write consistency QUORUM, read consistency ONE
            • Choosing consistency over availability consistency ALL
            • Choosing availability over consistency with write consistency ANY
            • Demonstrating how consistency is not a lock or a transaction
            • Chapter 6: Schema Design
              • Introduction
              • Saving disk space by using small column names
              • Serializing data into large columns for smaller index sizes
              • Storing time series data effectively
              • Using Super Columns for nested maps
              • Using a lower Replication Factor for disk space saving and performance enhancements
              • Hybrid Random Partitioner using Order Preserving Partitioner
              • Storing large objects
              • Using Cassandra for distributed caching
              • Storing large or infrequently accessed data in a separate column family
              • Storing and searching edge graph data in Cassandra
              • Developing secondary data orderings or indexes
              • Chapter 7: Administration
                • Defining seed nodes for Gossip Communication
                • Nodetool Move: Moving a node to a specific ring location
                • Nodetool Remove: Removing a downed node
                • Nodetool Decommission: Removing a live node
                • Joining nodes quickly with auto_bootstrap set to false
                • Generating SSH keys for password-less interaction
                • Copying the data directory to new hardware
                • A node join using external data copy methods
                • Nodetool Repair: When to use anti-entropy repair
                • Nodetool Drain: Stable files on upgrade
                • Lowering gc_grace for faster tombstone cleanup
                • Scheduling Major Compaction
                • Using nodetool snapshot for backups
                • Clearing snapshots with nodetool clearsnapshot
                • Restoring from a snapshot
                • Exporting data to JSON with sstable2json
                • Nodetool cleanup: Removing excess data
                • Nodetool Compact: Defragment data and remove deleted data from disk
                • Chapter 8: Multiple Datacenter Deployments
                  • Changing debugging to determine where read operations are being routed
                  • Using IPTables to simulate complex network scenarios in a local environment
                  • Choosing IP addresses to work with RackInferringSnitch
                  • Scripting a multiple datacenter installation
                  • Determining natural endpoints, datacenter, and rack for a given key
                  • Manually specifying Rack and Datacenter configuration with a property file snitch
                  • Troubleshooting dynamic snitch using JConsole
                  • Quorum operations in multi-datacenter environments
                  • Using traceroute to troubleshoot latency between network devices
                  • Ensuring bandwidth between switches in multiple rack environments
                  • Increasing rpc_timeout for dealing with latency across datacenters
                  • Changing consistency level from the CLI to test various consistency levels with multiple datacenter deployments
                  • Using the consistency levels TWO and THREE
                  • Calculating Ideal Initial Tokens for use with Network Topology Strategy and Random Partitioner
                  • Chapter 9: Coding and Internals
                    • Introduction
                    • Installing common development tools
                    • Building Cassandra from source
                    • Creating your own type by sub classing abstract type
                    • Using the validation to check data on insertion
                    • Communicating with the Cassandra developers and users through IRC and e-mail
                    • Generating a diff using subversion's diff feature
                    • Applying a diff using the patch command
                    • Using strings and od to quickly search through data files
                    • Customizing the sstable2json export utility
                    • Configure index interval ratio for lower memory usage
                    • Increasing phi_convict_threshold for less reliable networks
                    • Using the Cassandra maven plugin
                    • Chapter 10: Libraries and Applications
                      • Introduction
                      • Building the contrib stress tool for benchmarking
                      • Inserting and reading data with the stress tool
                      • Running the Yahoo! Cloud Serving Benchmark
                      • Hector, a high-level client for Cassandra
                      • Doing batch mutations with Hector
                      • Cassandra with Java Persistence Architecture (JPA)
                      • Setting up Solandra for full text indexing with a Cassandra backend
                      • Setting up Zookeeper to support Cages for transactional locking
                      • Using Cages to implement an atomic read and set
                      • Using Groovandra as a CLI alternative
                      • Searchable log storage with Logsandra
                      • Chapter 11: Hadoop and Cassandra
                        • Introduction
                        • A pseudo-distributed Hadoop setup
                        • A Map-only program that reads from Cassandra using the ColumnFamilyInputFormat
                        • A Map-only program that writes to Cassandra using the CassandraOutputFormat
                        • Using MapReduce to do grouping and counting with Cassandra input and output
                        • Setting up Hive with Cassandra Storage Handler support
                        • Defining a Hive table over a Cassandra Column Family
                        • Joining two Column Families with Hive
                        • Grouping and counting column values with Hive
                        • Co-locating Hadoop Task Trackers on Cassandra nodes
                        • Setting up a "Shadow" data center for running only MapReduce jobs
                        • Setting up DataStax Brisk the combined stack of Cassandra, Hadoop, and Hive
                        • Chapter 12: Collecting and Analyzing Performance Statistics
                          • Finding bottlenecks with nodetool tpstats
                          • Using nodetool cfstats to retrieve column family statistics
                          • Monitoring CPU utilization
                          • Adding read/write graphs to find active column families
                          • Using Memtable graphs to profile when and why they flush
                          • Graphing SSTable count
                          • Monitoring disk utilization and having a performance baseline
                          • Monitoring compaction by graphing its activity
                          • Using nodetool compaction stats to check the progress of compaction
                          • Graphing column family statistics to track average/max row sizes
                          • Using latency graphs to profile time to seek keys
                          • Tracking the physical disk size of each column family over time
                          • Using nodetool cfhistograms to see the distribution of query latencies
                          • Tracking open networking connections
                          • Chapter 13: Monitoring Cassandra Servers
                            • Introduction
                            • Forwarding Log4j logs to a central sever
                            • Using top to understand overall performance
                            • Using iostat to monitor current disk performance
                            • Using sar to review performance over time
                            • Using JMXTerm to access Cassandra JMX
                            • Monitoring the garbage collection events
                            • Using tpstats to find bottlenecks
                            • Creating a Nagios Check Script for Cassandra
                            • Keep an eye out for large rows with compaction limits
                            • Reviewing network traffic with IPTraf
                            • Keep on the lookout for dropped messages
                            • Inspecting column families for dangerous conditions

                            Book backreference: 
                            Awards Voting Nominations Previous Winners
                            Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
                            Resources
                            Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
                            Open Source Content Management Customer Relationship Management e-Commerce e-Learning Java Linux Servers Networking & Telephony PHP Web Graphics & Video Web Development
                            Enterprise BPEL Microsoft Oracle SOA Web Services
                            Other Packt Books .Net Web Graphics & Video Beginner Guides Cookbooks