Validity and Reliability of Personality Assessments

Personality assessments vary in their levels of validity and reliability.

Learning Objective

Evaluate the concepts of validity and reliability in the context of personality assessment

Key Points

Validity refers to whether or not a test actually measures the construct that it is meant to measure; reliability refers to the degree to which a test produces stable and consistent results.
Objective tests tend to be relatively free from rater bias and are thought to have more validity than projective tests.
The challenge of objective tests, however, is that they are subject to the willingness and ability of the respondents to be open, honest, and self-reflective enough to represent and report their true personality.
Projective tests have been criticized for having poor reliability and validity, for lacking scientific evidence, and for relying too much on the subjective judgment of a clinician.
One problem with personality measures is that individuals have a tendency to endorse vague generalizations that could apply to anyone; this is known as the Forer Effect.

Terms

psychometric
The design of psychological tests to measure intelligence, aptitude, and personality, and the analysis and interpretation of their results.
reliability
The overall consistency of a measure; its ability to produce similar results under consistent conditions.
validity
The extent to which a concept, conclusion, or measurement is well-founded and corresponds accurately to the real world.
Factor analysis
A statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables.

Full Text

Not all personality measures are created equal. When it comes to examining the validity and reliability of personality measures, some have better psychometric properties than others. Validity refers to whether or not a test actually measures the construct that it is meant to measure; reliability refers to the degree to which a test produces stable and consistent results.

Objective Tests

Objective tests (such as the Myers-Briggs Type Indicator, Neo Pi-R, Minnesota Multiphasic Personality Inventory, 16PF, and Eysenck Personality Questionnaire) are thought to be relatively free from rater bias, or the influence of the examiner's own beliefs. Because of this, objective tests are said to have more validity than projective tests. The challenge of objective tests, however, is that they are subject to the willingness and ability of the respondents to be open, honest, and self-reflective enough to represent and report their true personality; this limits their reliability.

The Minnesota Multiphasic Personality Inventory (MMPI) attempts to account for these weaknesses by including validity and reliability scales in addition to its clinical scales. One of the validity scales, the Lie Scale (or “L” Scale), consists of 15 items and is used to ascertain whether the respondent is “faking good” (in other words, under-reporting psychological problems in order to appear healthier). For example, if someone responds “yes” to a number of unrealistically positive items such as “I have never told a lie,” they may be trying to “fake good” or appear better than they actually are.

Reliability scales test the instrument’s consistency over time, assuring that if you take the MMPI today and then again five years later, your two scores will be similar. Beutler, Nussbaum, and Meredith (1988) gave the MMPI to newly recruited police officers and then to the same police officers two years later. After two years on the job, police officers’ responses indicated an increased vulnerability to alcoholism, somatic symptoms (vague, unexplained physical complaints), and anxiety. When the test was given an additional two years later (four years after starting on the job), the results suggested high risk for alcohol-related difficulties.

The MMPI-2 also revised many of the limitations within the original MMPI, thereby increasing its usefulness. For example, the original MMPI was intended to be used in clinical populations, and the normative sample (or the sample of individuals whose scores are used as a baseline against which all test-takers' scores are compared) consisted of psychiatric patients. For a clinical population, this information can reveal what is normative for that particular population; however it limits the usage and application to other nonclinical populations. The MMPI-2 used a normative sample from within the general population that was thought to be representative of all major demographic variables, expanding its applicability.

Many objective personality measures were created after years of research, such as the Eysenck Personality Questionnaire. Eysenck spent many years working with factor analysis and conducting countless laboratory experiments. The result is that the Eysenck Personality Questionnaire has excellent reliability and validity. Additionally, there is a large body of research that demonstrates the practical uses of the Eysenck measure.

Projective Tests

In contrast to objective tests, projective tests are much more sensitive to the examiner's beliefs. Projective measures like the Rorschach Inkblot Test and the Thematic Apperception Test have been criticized for having poor reliability and validity, for lacking scientific evidence, and for relying too much on the subjective judgment of a clinician. Some projective tests, like the Rorschach, have undergone standardization procedures so they can be relatively effective in measuring depression, psychosis, and anxiety. In the Thematic Apperception Test, however, which involves open-ended storytelling, standardization of test administration is virtually nonexistent, making the test relatively low on validity and reliability. Projective tests are often considered best used for informational purposes only, and not as a true measure of personality.

For many decades, traditional projective tests have been used in cross-cultural personality assessments. However, it was found that test bias limited their usefulness. It is difficult to assess the personalities and lifestyles of members of widely divergent ethnic/cultural groups using personality instruments based on data from a single culture or race. Therefore, it was vital to develop other personality assessments that explore factors like race, language, and level of acculturation (Hoy-Watkins & Jenkins-Moore, 2008).

The Forer Effect

One problem with personality measures is that individuals have a tendency to endorse vague generalizations. This is one reason why horoscopes continue to be popular and trusted despite their lack of reliability or validity. In 1948, Bertram Forer gave a personality inventory to his students in which he gave them each what he claimed was a unique personality profile, and he asked the students to rate how well the profile applied to each of them. What the students did not know is that they all received the exact same profile, consisting of very generalized descriptions which could apply to almost anyone. Overall, the students all rated the profile as near excellent at describing them.

In another study, students were given a personality inventory and then were given two personality profiles: an accurate one based upon the results of the inventory they took, and a generalized one that could apply to almost anyone. The students were then asked which of the two personality profiles was their own. More than half of the students selected the generalized profile as their own. Both of these studies demonstrate how personality measures can provide general or vague descriptions and still be accepted by individuals as accurate. This effect has come to be known as the Forer effect.

Astrological signs

Horoscopes are often endorsed because of the Forer effect. The generalized nature of the descriptions allows for a large number of individuals to believe that they are accurate.

[ edit ]

Prev Concept

Overview of Personality Assessment

Personality Testing in the Workplace

Next Concept