Clojure Data Analysis Cookbook
Formats:

save 15%!
save 37%!

Also available on: |
![]() ![]() ![]() ![]() |
- Get a handle on the torrent of data the modern Internet has created
- Recipes for every stage from collection to analysis
- A practical approach to analyzing data to help you make informed decisions
Book Details
Language : EnglishPaperback : 342 pages [ 235mm x 191mm ]
Release Date : March 2013
ISBN : 178216264X
ISBN 13 : 9781782162643
Author(s) : Eric Rochester
Topics and Technologies : All Books, Data, Cookbooks, Open Source
Table of Contents
PrefaceChapter 1: Importing Data for Analysis
Chapter 2: Cleaning and Validating Data
Chapter 3: Managing Complexity with Concurrent Programming
Chapter 4: Improving Performance with Parallel Programming
Chapter 5: Distributed Data Processing with Cascalog
Chapter 6: Working with Incanter Datasets
Chapter 7: Preparing for and Performing Statistical Data Analysis with Incanter
Chapter 8: Working with Mathematica and R
Chapter 9: Clustering, Classifying, and Working with Weka
Chapter 10: Graphing in Incanter
Chapter 11: Creating Charts for the Web
Index
- Chapter 1: Importing Data for Analysis
- Introduction
- Creating a new project
- Reading CSV data into Incanter datasets
- Reading JSON data into Incanter datasets
- Reading data from Excel with Incanter
- Reading data from JDBC databases
- Reading XML data into Incanter datasets
- Scraping data from tables in web pages
- Scraping textual data from web pages
- Reading RDF data
- Reading RDF data with SPARQL
- Aggregating data from different formats
- Chapter 2: Cleaning and Validating Data
- Introduction
- Cleaning data with regular expressions
- Maintaining consistency with synonym maps
- Identifying and removing duplicate data
- Normalizing numbers
- Rescaling values
- Normalizing dates and times
- Lazily processing very large data sets
- Sampling from very large data sets
- Fixing spelling errors
- Parsing custom data formats
- Validating data with Valip
- Chapter 3: Managing Complexity with Concurrent Programming
- Introduction
- Managing program complexity with STM
- Managing program complexity with agents
- Getting better performance with commute
- Combining agents and STM
- Maintaining consistency with ensure
- Introducing safe side effects into the STM
- Maintaining data consistency with validators
- Tracking processing with watchers
- Debugging concurrent programs with watchers
- Recovering from errors in agents
- Managing input with sized queues
- Chapter 4: Improving Performance with Parallel Programming
- Introduction
- Parallelizing processing with pmap
- Parallelizing processing with Incanter
- Partitioning Monte Carlo simulations for better pmap performance
- Finding the optimal partition size with simulated annealing
- Parallelizing with reducers
- Generating online summary statistics with reducers
- Harnessing your GPU with OpenCL and Calx
- Using type hints
- Benchmarking with Criterium
- Chapter 5: Distributed Data Processing with Cascalog
- Introduction
- Distributed processing with Cascalog and Hadoop
- Querying data with Cascalog
- Distributing data with Apache HDFS
- Parsing CSV files with Cascalog
- Complex queries with Cascalog
- Aggregating data with Cascalog
- Defining new Cascalog operators
- Composing Cascalog queries
- Handling errors in Cascalog workflows
- Transforming data with Cascalog
- Executing Cascalog queries in the Cloud with Pallet
- Chapter 6: Working with Incanter Datasets
- Introduction
- Loading Incanter's sample datasets
- Loading Clojure data structures into datasets
- Viewing datasets interactively with view
- Converting datasets to matrices
- Using infix formulas in Incanter
- Selecting columns with $
- Selecting rows with $
- Filtering datasets with $where
- Grouping data with $group-by
- Saving datasets to CSV and JSON
- Projecting from multiple datasets with $join
- Chapter 7: Preparing for and Performing Statistical Data Analysis with Incanter
- Introduction
- Generating summary statistics with $rollup
- Differencing variables to show changes
- Scaling variables to simplify variable relationships
- Working with time series data with Incanter Zoo
- Smoothing variables to decrease noise
- Validating sample statistics with bootstrapping
- Modeling linear relationships
- Modeling non-linear relationships
- Modeling multimodal Bayesian distributions
- Finding data errors with Benford's law
- Chapter 8: Working with Mathematica and R
- Introduction
- Setting up Mathematica to talk to Clojuratica for Mac OS X and Linux
- Setting up Mathematica to talk to Clojuratica for Windows
- Calling Mathematica functions from Clojuratica
- Sending matrices to Mathematica from Clojuratica
- Evaluating Mathematica scripts from Clojuratica
- Creating functions from Mathematica
- Processing functions in parallel in Mathematica
- Setting up R to talk to Clojure
- Calling R functions from Clojure
- Passing vectors into R
- Evaluating R files from Clojure
- Plotting in R from Clojure
- Chapter 9: Clustering, Classifying, and Working with Weka
- Introduction
- Loading CSV and ARFF files into Weka
- Filtering and renaming columns in Weka datasets
- Discovering groups of data using K-means clustering
- Finding hierarchical clusters in Weka
- Clustering with SOMs in Incanter
- Classifying data with decision trees
- Classifying data with the Naive Bayesian classifier
- Classifying data with support vector machines
- Finding associations in data with the Apriori algorithm
- Chapter 10: Graphing in Incanter
- Introduction
- Creating scatter plots with Incanter
- Creating bar charts with Incanter
- Graphing non-numeric data in bar charts
- Creating histograms with Incanter
- Creating function plots with Incanter
- Adding equations to Incanter charts
- Adding lines to scatter charts
- Customizing charts with JFreeChart
- Saving Incanter graphs to PNG
- Using PCA to graph multi-dimensional data
- Creating dynamic charts with Incanter
- Chapter 11: Creating Charts for the Web
- Introduction
- Serving data with Ring and Compojure
- Creating HTML with Hiccup
- Setting up to use ClojureScript
- Creating scatter plots with NVD3
- Creating bar charts with NVD3
- Creating histograms with NVD3
- Visualizing graphs with force-directed layouts
- Creating interactive visualizations with D3
Eric Rochester
Code Downloads
Download the code and support files for this book.
Submit Errata
Please let us know if you have found any errors not listed on this list by completing our errata submission form. Our editors will check them and add them to this list. Thank you.
Sample chapters
You can view our sample chapters and prefaces of this title on PacktLib or download sample chapters in PDF format.
- Create beautiful, insightful graphs that you can publish to the Internet
- Apply powerful clustering and data mining techniques to better understand your data
- Use powerful data analysis libraries like Incanter, Hadoop, and Weka to get things done quickly
- Interface with Mathematica and R to use the powerful analysis features they provide
- Process data concurrently and in parallel for faster performance
- Transform data to make it more useful and easier to analyze
Data is everywhere and it's increasingly important to be able to gain insights that we can act on. Using Clojure for data analysis and collection, this book will show you how to gain fresh insights and perspectives from your data with an essential collection of practical, structured recipes.
"The Clojure Data Analysis Cookbook" presents recipes for every stage of the data analysis process. Whether scraping data off a web page, performing data mining, or creating graphs for the web, this book has something for the task at hand.
You'll learn how to acquire data, clean it up, and transform it into useful graphs which can then be analyzed and published to the Internet. Coverage includes advanced topics like processing data concurrently, applying powerful statistical techniques like Bayesian modelling, and even data mining algorithms such as K-means clustering, neural networks, and association rules.
Full of practical tips, the "Clojure Data Analysis Cookbook" will help you fully utilize your data through a series of step-by-step, real world recipes covering every aspect of data analysis.
Prior experience with Clojure and data analysis techniques and workflows will be beneficial, but not essential.