Mann-Whitney U-Test

The Mann–Whitney $U$-test is a non-parametric test of the null hypothesis that two populations are the same against an alternative hypothesis.

Learning Objective

Compare the Mann-Whitney $U$-test to Student's $t$-test

Key Points

Mann-Whitney has greater efficiency than the $t$-test on non-normal distributions, such as a mixture of normal distributions, and it is nearly as efficient as the $t$-test on normal distributions.
The test involves the calculation of a statistic, usually called $U$, whose distribution under the null hypothesis is known.
The first method to calculate $U$ involves choosing the sample which has the smaller ranks, then counting the number of ranks in the other sample that are smaller than the ranks in the first, then summing these counts.
The second method involves adding up the ranks for the observations which came from sample 1. The sum of ranks in sample 2 is now determinate, since the sum of all the ranks equals $\frac{N(N+1)}{2}$, where $N$ is the total number of observations.

Terms

ordinal data
A statistical data type consisting of numerical scores that exist on an ordinal scale, i.e. an arbitrary numerical scale where the exact numerical quantity of a particular value has no significance beyond its ability to establish a ranking over a set of data points.
tie
One or more equal values or sets of equal values in the data set.

Full Text

The Mann–Whitney $U$-test is a non-parametric test of the null hypothesis that two populations are the same against an alternative hypothesis, especially that a particular population tends to have larger values than the other. It has greater efficiency than the $t$-test on non-normal distributions, such as a mixture of normal distributions, and it is nearly as efficient as the $t$-test on normal distributions.

Assumptions and Formal Statement of Hypotheses

Although Mann and Whitney developed the test under the assumption of continuous responses with the alternative hypothesis being that one distribution is stochastically greater than the other, there are many other ways to formulate the null and alternative hypotheses such that the test will give a valid test. A very general formulation is to assume that:

All the observations from both groups are independent of each other.
The responses are ordinal (i.e., one can at least say of any two observations which is the greater).
The distributions of both groups are equal under the null hypothesis, so that the probability of an observation from one population ($X$) exceeding an observation from the second population ($Y$) equals the probability of an observation from $Y$exceeding an observation from $X$. That is, there is a symmetry between populations with respect to probability of random drawing of a larger observation.
Under the alternative hypothesis, the probability of an observation from one population ($X$) exceeding an observation from the second population ($Y$) (after exclusion of ties) is not equal to $0.5$. The alternative may also be stated in terms of a one-sided test, for example: $P(X > Y) + 0.5 \cdot P(X = Y) > 0.5$.

Calculations

The test involves the calculation of a statistic, usually called $U$, whose distribution under the null hypothesis is known. In the case of small samples, the distribution is tabulated, but for sample sizes above about 20, approximation using the normal distribution is fairly good.

There are two ways of calculating $U$ by hand. For either method, we must first arrange all the observations into a single ranked series. That is, rank all the observations without regard to which sample they are in.

Method One

For small samples a direct method is recommended. It is very quick, and gives an insight into the meaning of the $U$ statistic.

Choose the sample for which the ranks seem to be smaller (the only reason to do this is to make computation easier). Call this "sample 1," and call the other sample "sample 2. "
For each observation in sample 1, count the number of observations in sample 2 that have a smaller rank (count a half for any that are equal to it). The sum of these counts is $U$.

Method Two

For larger samples, a formula can be used.

First, add up the ranks for the observations that came from sample 1. The sum of ranks in sample 2 is now determinate, since the sum of all the ranks equals:

$\dfrac{N(N + 1)}{2}$

where $N$ is the total number of observations. $U$ is then given by:

$U_1=R_1 - \dfrac{n_1(n_1+1)}{2}$

where $n_1$ is the sample size for sample 1, and $R_1$ is the sum of the ranks in sample 1. Note that it doesn't matter which of the two samples is considered sample 1. The smaller value of $U_1$ and $U_2$ is the one used when consulting significance tables.

Example of Statement Results

In reporting the results of a Mann–Whitney test, it is important to state:

a measure of the central tendencies of the two groups (means or medians; since the Mann–Whitney is an ordinal test, medians are usually recommended)
the value of $U$
the sample sizes
the significance level

In practice some of this information may already have been supplied and common sense should be used in deciding whether to repeat it. A typical report might run:

"Median latencies in groups $E$ and $C$ were $153$ and $247$ ms; the distributions in the two groups differed significantly (Mann–Whitney $U=10.5$, $n_1=n_2=8$, $P < 0.05\text{, two-tailed}$)."

Comparison to Student's $t$-Test

The $U$-test is more widely applicable than independent samples Student's $t$-test, and the question arises of which should be preferred.

Ordinal Data

$U$ remains the logical choice when the data are ordinal but not interval scaled, so that the spacing between adjacent values cannot be assumed to be constant.

Robustness

As it compares the sums of ranks, the Mann–Whitney test is less likely than the $t$-test to spuriously indicate significance because of the presence of outliers (i.e., Mann–Whitney is more robust).

Efficiency

For distributions sufficiently far from normal and for sufficiently large sample sizes, the Mann-Whitney Test is considerably more efficient than the $t$. Overall, the robustness makes Mann-Whitney more widely applicable than the $t$-test. For large samples from the normal distribution, the efficiency loss compared to the $t$-test is only 5%, so one can recommend Mann-Whitney as the default test for comparing interval or ordinal measurements with similar distributions.

[ edit ]

Prev Concept

When to Use These Tests

Wilcoxon t-Test

Next Concept