Cross tabulation (or crosstabs for short) is a statistical process that summarizes categorical data to create a contingency table. It is used heavily in survey research, business intelligence, engineering, and scientific research. Moreover, it provides a basic picture of the interrelation between two variables and can help find interactions between them.
In survey research (e.g., polling, market research), a "crosstab" is any table showing summary statistics. Commonly, crosstabs in survey research are combinations of multiple different tables. For example, combines multiple contingency tables and tables of averages.
Crosstab of Cola Preference by Age and Gender
A crosstab is a combination of various tables showing summary statistics.
Contingency Tables
A contingency table is a type of table in a matrix format that displays the (multivariate) frequency distribution of the variables. A crucial problem of multivariate statistics is finding the direct dependence structure underlying the variables contained in high dimensional contingency tables. If some of the conditional independences are revealed, then even the storage of the data can be done in a smarter way. In order to do this, one can use information theory concepts, which gain the information only from the distribution of probability. Probability can be expressed easily from the contingency table by the relative frequencies.
As an example, suppose that we have two variables, sex (male or female) and handedness (right- or left-handed). Further suppose that 100 individuals are randomly sampled from a very large population as part of a study of sex differences in handedness. A contingency table can be created to display the numbers of individuals who are male and right-handed, male and left-handed, female and right-handed, and female and left-handed .
Contingency Table
Contingency table created to display the numbers of individuals who are male and right-handed, male and left-handed, female and right-handed, and female and left-handed.
The numbers of the males, females, and right-and-left-handed individuals are called marginal totals. The grand total--i.e., the total number of individuals represented in the contingency table-- is the number in the bottom right corner.
The table allows us to see at a glance that the proportion of men who are right-handed is about the same as the proportion of women who are right-handed, although the proportions are not identical. If the proportions of individuals in the different columns vary significantly between rows (or vice versa), we say that there is a contingency between the two variables. In other words, the two variables are not independent. If there is no contingency, we say that the two variables are independent.
Standard Components of a Crosstab
- Multiple columns - each column refers to a specific sub-group in the population (e.g., men). The columns are sometimes referred to as banner points or cuts (and the rows are sometimes referred to as stubs).
- Significance tests - typically, either column comparisons--which test for differences between columns and display these results using letters-- or cell comparisons--which use color or arrows to identify a cell in a table that stands out in some way (as in the example above).
- Nets or netts - which are sub-totals.
- One or more of the following: percentages, row percentages, column percentages, indexes, or averages.
- Unweighted sample sizes (i.e., counts).
Most general-purpose statistical software programs are able to produce simple crosstabs. Creation of the standard crosstabs used in survey research, as shown above, is typically done using specialist crosstab software packages, such as:
- New Age Media Systems (EzTab)
- SAS
- Quantum
- Quanvert
- SPSS Custom Tables
- IBM SPSS Data Collection Model programs
- Uncle
- WinCross
- Q
- SurveyCraft
- BIRT