Rank Correlation
In statistics, a rank correlation is any of several statistics that measure the relationship between rankings of different ordinal variables or different rankings of the same variable, where a "ranking" is the assignment of the labels (e.g., first, second, third, etc.) to different observations of a particular variable. A rank correlation coefficient measures the degree of similarity between two rankings, and can be used to assess the significance of the relation between them.
If, for example, one variable is the identity of a college basketball program and another variable is the identity of a college football program, one could test for a relationship between the poll rankings of the two types of program: do colleges with a higher-ranked basketball program tend to have a higher-ranked football program? A rank correlation coefficient can measure that relationship, and the measure of significance of the rank correlation coefficient can show whether the measured relationship is small enough to be likely to be a coincidence.
If there is only one variable, the identity of a college football program, but it is subject to two different poll rankings (say, one by coaches and one by sportswriters), then the similarity of the two different polls' rankings can be measured with a rank correlation coefficient.
Some of the more popular rank correlation statistics include Spearman's rho (
Spearman's $\rho$
Spearman developed a method of measuring rank correlation known as Spearman's rank correlation coefficient. It is generally denoted by
- When ranks are given
- When ranks are not given
- Repeated ranks
The formula for calculating Spearman's rank correlation coefficient is:
where
Kendall's τ
The definition of the Kendall coefficient is as follows:
Let
- The observations are sadi to be concordant if the ranks for both elements agree—that is, if both
$x_i > x_j$ and$y_i > y_j$ , or if both$x_i < x_j$ and$y_i < y_j$ . - The observations are said to be discordant if
$x_i > x_j$ and$y_i < y_j$ , or if$x_i < x_j$ and$y_i > y_j$ . - The observations are neither concordant nor discordant if
$x_i = x_j$ or$y_i = y_j$ .
The Kendall
and has the following properties:
- The denominator is the total number pair combinations, so the coefficient must be in the range
$-1 \leq \tau \leq 1$ . - If the agreement between the two rankings is perfect (i.e., the two rankings are the same) the coefficient has value
$1$ . - If the disagreement between the two rankings is perfect (i.e., one ranking is the reverse of the other) the coefficient has value
$-1$ . - If
$X$ and$Y$ are independent, then we would expect the coefficient to be approximately zero.
Kendall's