Vertical Strips in a Scatter Plot
Imagine that you have a scatter plot, on top of which you draw a narrow vertical strip. The
Vertical Strips
Drawing vertical strips on top of a scatter plot will result in the
This new data set can also be used to construct a histogram, which can subsequently be used to assess the assumption that the residuals are normally distributed. To the extent that the histogram matches the normal distribution, the residuals are normally distributed. This gives us an indication of how well our sample can predict a normal distribution in the population.
Residual Histogram
To the extent that a residual histogram matches the normal distribution, the residuals are normally distributed.
Homoscedasticity Versus Heteroscedasticity
When various vertical strips drawn on a scatter plot, and their corresponding data sets, show a similar pattern of spread, the plot can be said to be homoscedastic. Another way of putting this is that the prediction errors will be similar along the regression line.
In technical terms, a data set is homoscedastic if all random variables in the sequence have the same finite variance. A residual plot displaying homoscedasticity should appear to resemble a horizontal football. The presence of this shape lets us know if we can use the regression method. The assumption of homoscedasticity simplifies mathematical and computational treatment; however, serious violations in homoscedasticity may result in overestimating the goodness of fit.
In regression analysis, one assumption of the fitted model (to ensure that the least-squares estimators are each a best linear unbiased estimator of the respective population parameters) is that the standard deviations of the error terms are constant and do not depend on the
When a scatter plot is heteroscedastic, the prediction errors differ as we go along the regression line. In technical terms, a data set is heteroscedastic if there are sub-populations that have different variabilities from others. Here "variability" could be quantified by the variance or any other measure of statistical dispersion.
The possible existence of heteroscedasticity is a major concern in the application of regression analysis, including the analysis of variance, because the presence of heteroscedasticity can invalidate statistical tests of significance that assume that the modelling errors are uncorrelated and normally distributed and that their variances do not vary with the effects being modelled. Similarly, in testing for differences between sub-populations using a location test, some standard tests assume that variances within groups are equal.