goodness of fit
(noun)
how well a statistical model fits a set of observations
Examples of goodness of fit in the following topics:
-
Practice 1: Goodness-of-Fit Test
- The cumulative number of AIDS cases reported for Santa Clara County is broken down by ethnicity as follows: (Source: HIV/AIDS Epidemiology Santa Clara County, Santa Clara County Public Health Department, May 2011)
- The percentage of each ethnic group in Santa Clara County is as follows:
- If the ethnicity of AIDS victims followed the ethnicity of the total county population, fill in the expected number of cases per ethnic group.
- Perform a goodness-of-fit test to determine whether the make-up of AIDS cases follows the ethnicity of the general population of Santa Clara County.
- Does it appear that the pattern of AIDS cases in Santa Clara County corresponds to the distribution of ethnic groups in this county?
-
The Chi-Square Distribution: Comparison Summary of the Chi-Square Tests Goodness-of-Fit, Independence and Homogeneity
- Goodness-of-Fit: Use the Goodness-of-Fit Test to decide whether a population with unknown distribution "fits" a known distribution.
- In this case there will be a single qualitative survey question or a single outcome of an experiment from a single population.
- Goodness-of-Fit is typically used to see if the population is uniform (all outcomes occur with equal frequency), the population is normal, or the population is the same as another population with known distribution.
- The null and alternative hypotheses are: Ho: The population fits the given distribution.
- Ha: The population does not fit the given distribution.
-
Goodness of Fit
- The goodness of fit test determines whether the data "fit" a particular distribution or not.
- Goodness of fit means how well a statistical model fits a set of observations.
- where $n$ is the number of categories.The goodness-of-fit test is almost always right tailed.
- These hypotheses hold for all chi-square goodness of fit tests.
- The $\nu$ in a chi-square goodness of fit test is equal to the number of categories, $c$, minus one ($\nu=c-1$).
-
Try these true/false questions.
- In a Goodness-of-Fit test, the expected values are the values we would expect if the null hypothesis were true.
- In general, if the observed values and expected values of a Goodness-of-Fit test are not close together, then the test statistic can get very large and on a graph will be way out in the right tail.
- Use a Goodness-of-Fit test to determine if high school principals believe that students are absent equally during the week or not.
- The test to use to determine if a six-sided die is fair is a Goodness-of-Fit test.
- In a Goodness-of Fit test, if the p-value is 0.0113, in general, do not reject the null hypothesis.
-
Example: Test for Goodness of Fit
- The Chi-square test for goodness of fit compares the expected and observed values to determine how well an experimenter's predictions fit the data.
- Pearson's chi-squared test uses a measure of goodness of fit, which is the sum of differences between observed and expected outcome frequencies (that is, counts of observations), each squared and divided by the expectation:
- $H_0$: The absent days occur with equal frequencies—that is, they fit a uniform distribution.
- $H_a$: The absent days occur with unequal frequencies—that is, they do not fit a uniform distribution.
- Support the use of Pearson's chi-squared test to measure goodness of fit
-
Student Learning Outcomes
- By the end of this chapter, the student should be able to:
-
Using simulation for goodness of fit tests
- Simulation methods may also be used to test goodness of fit.
- We do this many times (e.g. 10,000 times), and then examine the distribution of these simulated chi-square test statistics.
- This distribution will be a very precise null distribution for the test statistic X2 if the probabilities are accurate, and we can find the upper tail of this null distribution, using a cutoff of the observed test statistic, to calculate the p-value.
- Section 6.3 introduced an example where we considered whether jurors were racially representative of the population.
- Figure 6.21 shows the simulated null distribution using 100,000 simulated values with an overlaid curve of the chi-square distribution.
-
Summary of Formulas
- Use goodness-of-fit to test whether a data set fits a particular probability distribution.
- The degrees of freedom are number of cells or categories - 1.
- The degrees of freedom are equal to (number of columns - 1)(number of rows - 1).
- NOTE: The expected value for each cell needs to be at least 5 in order to use the Goodness-of-Fit, Independence and Homogeneity tests.
- The degrees of freedom are the number of samples - 1.
-
Introduction to testing for goodness of fit using chi-square
- Given a sample of cases that can be classified into several groups, determine if the sample is representative of the general population.
- Each of these scenarios can be addressed using the same statistical test: a chi-square test.
- In the first case, we consider data from a random sample of 275 jurors in a small county.
- If the jury is representative of the population, then the proportions in the sample should roughly reflect the population of eligible jurors, i.e. registered voters.
- A second application, assessing the fit of a distribution, is presented at the end of this section.
-
Summary
- Line of Best Fit or Least Squares Line (LSL): $\hat{y}$= a+bx x = independent variable; y = dependent variable
- Used to determine whether a line of best fit is good for prediction.
- Sum of Squared Errors (SSE): The smaller the SSE, the better the original set of points fits the line of best fit.
- Outlier: A point that does not seem to fit the rest of the data.