Correlation ratio

In statistics, the correlation ratio is a measure of the relationship between the statistical dispersion within individual categories and the dispersion across the whole population or sample.

Suppose each observation is yxi where x indicates the category that observation is in and xi is the label of the particular observation. We will write nx for the number of observations in category x (not necessarily the same for different values of x) and

\overline{y_x}=\frac{\sum_i y_{xi}}{n_x} and \overline{y}=\frac{\sum_x n_x \overline{y_x}}{\sum_x n_x}

then the correlation ratio η (eta) is defined so as to satisfy

\eta^2 = \frac{\sum_x n_x (\overline{y_x}-\overline{y})^2}{\sum_{xi} (y_{xi}-\bar{y})^2} which might be written as \frac{\sigma_{\overline{y}}^2}{\sigma_{y}^2}.

It is worth noting that if the relationship between values of x \;\ and values of \overline{y_x} is linear (which is certainly true when there are only two possibilities for x) this will give the same result as the correlation coefficient; if not then the correlation ratio will be larger in magnitude, though still no more than 1 in magnitude. It can therefore be used for judging non-linear relationships.

This article is licensed under the GNU Free Documentation License. It uses material from Wikipedia article. Browse Wikipedia for more information.