Can a test that has poor reliability be valid?
For every dimension of interest and specific question or set of questions, there are a vast number of ways to make questions. Although the guiding principle should be the specific purposes of the research, there are better and worse questions for any particular operationalization. How to evaluate the measures? Show Two of the primary criteria of evaluation in any measurement or observation are:
These two concepts are validity and reliability. Reliability is concerned with questions of stability and consistency - does the same measurement tool yield stable and consistent results when repeated over time. Think about measurement processes in other contexts - in construction or woodworking, a tape measure is a highly reliable measuring instrument. Say you have a piece of wood that is 2 1/2 feet long. You
measure it once with the tape Validity refers to the extent we are measuring what we hope to measure (and what we think we are measuring). To continue with the example of measuring the piece of wood, a tape measure that has been created with accurate spacing for inches, feet, etc. should yield valid results as well. Measuring this piece of wood with a "good" tape measure should produce a correct measurement of the wood's length. To apply these concepts to social research, we want to use measurement tools that are both reliable and valid. We want questions that yield consistent responses when asked multiple times - this is reliability. Similarly, we want questions that get accurate responses from respondents - this is validity. ReliabilityReliability refers to a condition where a measurement process yields consistent scores (given an unchanged measured phenomenon) over repeat measurements. Perhaps the most straightforward way to assess reliability is to ensure that they meet the following three criteria of reliability. Measures that are high in reliability should exhibit all three. Test-Retest ReliabilityWhen a researcher administers the same measurement tool multiple times - asks the same question, follows the same research procedures, etc. - does he/she obtain consistent results, assuming that there has been no change in whatever he/she is measuring? This is really the simplest method for assessing reliability - when a researcher asks the same person the same question twice ("What's your name?"), does he/she get back the same results both times. If so, the measure has test-retest reliability. Measurement of the piece of wood talked about earlier has high test-retest reliability. Inter-Item ReliabilityThis is a dimension that applies to cases where multiple items are used to measure a Interobserver ReliabilityInterobserver reliability concerns the extent to which different interviewers or observers using the same measure get equivalent results. If different observers or interviewers use the same instrument to score the same thing, their scores should match. For example, the interobserver reliability of an observational assessment of parent-child interaction is often evaluated by showing two observers a videotape of a parent and child at play. These observers are asked to use an assessment tool to score the interactions between parent and child on the tape. If the instrument has high interobserver reliability, the scores of the two observers should match. ValidityTo reiterate, validity refers to the extent we are measuring what we hope to measure (and what we think we are measuring). How to assess the validity of a set of measurements? A valid measure should satisfy four criteria. Face ValidityThis criterion is an assessment of whether a measure appears, on the face of it, to measure the concept it is intended to measure. This is a very minimum assessment - if a measure cannot satisfy this criterion, then the other criteria are inconsequential. We can think about observational measures of behavior that would have face validity. For example, striking out at another person would have face validity for an indicator of aggression. Similarly, offering assistance to a stranger would meet the criterion of face validity for helping. However, asking people about their favorite movie to measure racial prejudice has little face validity. Content ValidityContent validity concerns the extent to which a
measure adequately represents all facets of a concept. Consider a series of questions that serve as indicators of depression (don't feel like eating, lost interest in things usually enjoyed, etc.). If there were other kinds of common behaviors that mark a person as depressed that were not included in the index, then the index would have low content validity since it did not adequately represent Criterion-Related ValidityCriterion-related validity applies to instruments than have been developed for usefulness as indicator of specific trait or behavior, either now or in the future. For example, think about the driving test as a social measurement that has pretty good predictive validity. That is to say, an individual's performance on a driving test correlates well with his/her driving ability. Construct ValidityBut for a many things we want to measure, there is not necessarily a pertinent criterion available. In this case, turn to construct validity, which concerns the extent to which a measure is related to other measures as specified by theory or previous research. Does a measure stack up with other variables the way we expect it to? A good example of this form of validity comes from early self-esteem studies - self-esteem refers to a person's sense of self-worth or self-respect. Clinical observations in psychology had shown that people who had low self-esteem often had depression. Therefore, to establish the construct validity of the self-esteem measure, the researchers showed that those with higher scores on the self-esteem measure had lower depression scores, while those with low self-esteem had higher rates of depression. Validity and Reliability ComparedSo what is the relationship between validity and reliability? The two do not necessarily go hand-in-hand.
It is possible to have a measure that has high reliability but low validity - one that is consistent in getting bad information or consistent in missing the mark. *It is also possible to have one that has low reliability and low validity - inconsistent and not on target. Finally, it is not possible to have a measure that has low reliability and high validity - you can't really get at what you want or what you're interested in if your measure fluctuates wildly. Is reliability necessary for validity?If test scores are not reliable, they cannot be valid since they will not provide a good estimate of the ability or trait that the test intends to measure. Reliability is therefore a necessary but not sufficient condition for validity. Reliability refers to the accuracy or repeatability of the test scores.
How does low reliability affect validity?It is integral to validity. Conceptually, you can't draw valid conclusions about the results of a survey or test if the data aren't reliable. And technically, in the calculation of validity coefficients, the degree of reliability in a set of scores puts a limit on the ceiling for their validity.
Why cant a test be valid if it is not reliable?Although a test can be reliable without being valid, it cannot be valid without being reliable. If a test is inconsistent in its measurements, we cannot say it is measuring what it is intended to measure and, therefore, it is considered invalid.
Can a test with low reliability still be valid?A test is valid if it measures what it is supposed to measure. If theresults of the personality test claimed that a very shy person was in factoutgoing, the test would be invalid. Reliability and validity are independent of each other. A measurement maybe valid but not reliable, or reliable but not valid.
|