All Questions
123
votes
20answers
33k views
Python as a statistics workbench
Lots of people use a main tool like Excel or another spreadsheet, SPSS, STATA, or R for their statistics needs. They might turn to some specific package for very special needs, but a lot of things can ...
101
votes
49answers
28k views
What is your favorite “data analysis” cartoon?
This is one of my favorites:
One entry per answer. This is in the vein of the Stack Overflow question What’s your favorite “programmer” cartoon?.
P.S. Do not hotlink the cartoon without the site's ...
99
votes
128answers
20k views
Famous statistician quotes
What is your favorite statistician quote?
This is community wiki, so please one quote per answer.
97
votes
12answers
12k views
The Two Cultures: statistics vs. machine learning?
Last year, I read a blog post from Bendan O'Connor entitled "Statistics vs. Machine Learning, fight!" that discussed some of the differences between the two fields. Andrew Gelman responded to ...
95
votes
18answers
31k views
Making sense of principal component analysis, eigenvectors & eigenvalues
In today's pattern recognition class my professor talked about PCA, eigenvectors & eigenvalues.
I got the mathematics of it. If I'm asked to find eigenvalues etc. I'll do it correctly like a ...
83
votes
8answers
13k views
Detecting a given face in a database of facial images
I'm working on a little project involving the faces of twitter users via their profile pictures.
A problem I've encountered is that after I filter out all but the images that are clear portrait ...
80
votes
41answers
5k views
What are common statistical sins?
I'm a grad student in psychology, and as I pursue more and more independent studies in statistics, I am increasingly amazed by the inadequacy of my formal training. Both personal and second hand ...
77
votes
16answers
19k views
Why square the difference instead of taking the absolute value in standard deviation?
In the definition of standard deviation, why do we have to square the difference from the mean to get the mean (E) and take the square root back at the end? Can't we just simply take the absolute ...
70
votes
13answers
7k views
Bayesian and frequentist reasoning in plain English
How would you describe in plain English the characteristics that distinguish Bayesian from Frequentist reasoning?
66
votes
10answers
13k views
Is there any reason to prefer the AIC or BIC over the other?
The AIC and BIC are both methods of assessing model fit penalized for the number of estimated parameters. As I understand it, BIC penalizes models more for free parameters than does AIC. Beyond a ...
61
votes
7answers
17k views
What is the difference between “likelihood” and “probability”?
The wikipedia page claims that likelihood and probability are distinct concepts.
In non-technical parlance, "likelihood" is usually a synonym for "probability," but in statistical usage there is a ...
59
votes
16answers
4k views
How to annoy a statistical referee?
I recently asked a question regarding general principles around reviewing statistics in papers. What I would now like to ask, is what particularly irritates you when reviewing a paper, i.e. what's the ...
57
votes
6answers
4k views
Is $R^2$ useful or dangerous?
I was skimming through some lecture notes by Cosma Shalizi (in particular, section 2.1.1 of the second lecture), and was reminded that you can get very low $R^2$ even when you have a completely linear ...
56
votes
21answers
3k views
Locating freely available data samples
I've been working on a new method for analyzing and parsing datasets to identify and isolate subgroups of a population without foreknowledge of any subgroup's characteristics. While the method works ...
53
votes
6answers
2k views
Explaining to laypeople why bootstrapping works
I recently used bootstrapping to estimate confidence intervals for a project. Someone who doesn't know much about statistics recently asked me to explain why bootstrapping works, i.e., why is it that ...