Did you know? DZone has great portals for Python, Cloud, NoSQL, and HTML5!
Big Data/BI Zone is brought to you in partnership with:

Machine learner and data scientist, Ph.D. from the University of Bonn in 2005, now working as a PostDoc at TU Berlin and as chief data scientist and co-founder at TWIMPACT, a startup focussing on real-time social media analysis. Mikio is a DZone MVB and is not an employee of DZone and has posted 4 posts at DZone. You can read more from them at their website. View Full User Profile

An Introduction to Bayesian Inference

06.24.2012
Email
Views: 4482
  • submit to reddit
The Big Data/BI Zone is brought to you in partnership with Jaspersoft and GridGain.  Here you'll learn to work with large data sets, gain Business Intelligence,  and even brush up on your statistics and data science.  GridGain and Jaspersoft have leading experts and resources in the Big Data space.

I recently rediscovered these slides from a talk I gave back in 2007 and wanted to share them with you. For those of you who don’t know, Bayesian inference is certain way to approach learning from data and statistical inference. It’s named after Thomas Bayes, an English mathematician who lived in the 18th century.

The main idea (and please be kind with me, I’m not a Bayesian) of Bayes inference is to model your data and your expectations about the data using probability distributions. You write down a so-called generative model for the data, that is, what you expect the distribution of your data to be given its model parameters. Then, if you also specify your prior belief about the distribution of the parameters, you can derive an updated distribution over your parameters given observed data.

Bayesian inference has been applied to the whole range of inference problems, ranging from classification to regression to clustering, and beyond. The main inference step sketched above involves integrating (as in f(x)dx ot , not as in continuous integration ;)) over the parameter space which is numerically intractable for most complex distributions. Therefore, Bayesian inference often relies on techniques used in numerical integration like Markov chain Monte Carlo-Methods, Gibbs sampling, or other kinds of approximations like Variational Bayes, which is related to mean-field approximations used in statistical physics.

There is a very silly (at least IMHO) divide within the field of statistics between Frequentists and Bayesians which I’ve discussed elsewhere.

In any case, the slides above discuss the very basics: Bayes rule, the role of the prior, the concept of conjugancy (combinations of model assumptions and priors which can be solved exactly, that is without requiring numerical integration) and pseudo-counts, and a bit of discussion on the Frequentism vs. Bayesian divide.

 

Some Introductory Remarks on Bayesian Inference


 

Published at DZone with permission of Mikio Braun, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

The Big Data/BI Zone is a prime resource and community for Big Data geeks of all stripes.  We're on top of all the best tips and news for Hadoop, R, and data visualization technologies.  Not only that, but we also give you advice from data science experts on how to understand and present that data.  With great resources and experts from two leaders in the space, GridGain and Jaspersoft, we can bring you the most practical information for dealing with today's data challenges.