Ask Fact Master home

Lukol - Anonymous Web Search Engine





Google Custom Search

Algorithms for calculating variance

In statistics, a formula for calculating the variance of a population of size n is:

\mathit{Variance} = \frac {n\sum_{i=1}^{n} x_i^2 - (\sum_{i=1}^{n} x_i)^2}{n^2}

A formula for calculating the unbiased estimation of the population variance from n finite samples is:

\mathit{Variance} = \frac {n\sum_{i=1}^{n} x_i^2 - (\sum_{i=1}^{n} x_i)^2}{n(n-1)}

The method of calculation may be more easily understood from the table below where the mean is 8.

i xi xi-mean (xi-mean)2
(index) (datum) (deviation) (squared deviation)
1 5 -3 9
2 7 -1 1
3 8 0 0
4 10 2 4
5 10 2 4
n=5 sum=40 0 18
  • mean = 40/5 = 8
  • variance = (5*338 - 402)/(5 * 4) = 4.5
  • standard deviation = \sqrt{\mathit{Variance}}= 2.12

Note: Details of the variance calculation:

338 = [52 + 72 + 82 + 102 + 102]
40 = [5 + 7 + 8 + 10 + 10]

Algorithm

Therefore a simple algorithm to calculate variance can be described by the following pseudocode:

double sum;
double sum_sqr;
double variance;
long n = data.length; // the number of elements in the data array (the actual syntax is language-specific)

for i = 1 to n
 sum += data[i];
 sum_sqr += ( data[i] * data[i] );
end for

variance = ((n * sum_sqr) - (sum * sum))/(n*(n-1));

Algorithm

Another algorithm which avoids large numbers in sum_sqr while summing up

double avg = 0;
double var = 0;
long n = data.length; // number of elements

for i = 1 to n
 avg = (avg*i + data[i]) / (i + 1);
 var = (var * (i - 1) + (data[i] - avg)*(data[i] - avg)) / i;
end for

return var; // resulting variance

The above algorithm does not seem to be correct. The resulting variance does not equal either one of the variances whose formulas are given at the top of the page.


Google Custom Search





This article uses material from Wikipedia and is licensed under the GNU Free Documentation License.


Served by Ask Fact Master in 0.10 secs.