I have written the following C code for calculating the Shannon Entropy of a distribution of 8-bit ints. But obviously this is very inefficient for small arrays and won't work with say 32-bit integers, because that would require literally gigabytes of memory. I am not very experienced in C and don't know what would be the best approach here. If it would simplify things, I could use C++ or Objective-C...
Also, please tell me about any other issues with the code you may find :-)
double entropyOfDistribution(uint8_t *x, size_t length)
{
double entropy = 0.0;
//Counting number of occurrences of a number (using "buckets")
double *probabilityOfX = calloc(sizeof(double), 256);
for (int i = 0; i < length; i++)
probabilityOfX[x[i]] += 1.0;
//Calculating the probabilities
for (int i = 0; i < 256; i++)
probabilityOfX[x[i]] /= length;
//Calculating the sum of p(x)*lg(p(x)) for all X
double sum = 0.0;
for (int i = 0; i < 256; i++)
if (probabilityOfX[i] > 0.0)
sum += probabilityOfX[i] * log2(probabilityOfX[i]);
entropy = -1.0 * sum;
free(probabilityOfX);
return entropy;
}
btw, this is the formula I implemented:
for (int i = 0; i < 256; i++) probabilityOfX[x[i]] /= length;
, ifx
has a length of 10 (or anything else less than 256). I believe here you wantprobabilityOfX[i] /= length;
– Jerry Coffin yesterday