Reliability and Validity
Not only must a good test be standardized, but it must also be reliable and valid.


If a test is reliable, we should obtain the same score no matter where, when, or how many times we take it (if other variables remain the same). Several methods are used to determine if a test is reliable. In the test-retest method, the same exam is administered to the same group on two different occasions, and the scores compared. The closer the correlation coefficient is to 1.0, the more reliable the test. The problem with this method of determining reliability or consistency is that performance on the second test may be better because test takers are already familiar with the questions and test procedures. In the split-half method, the score on one half of the test questions is correlated with the score on the other half of the questions to see if they are consistent. One way to do that might be to compare the score of all the odd-numbered questions to the score of all the even-numbered questions. In the alternate form method or equivalent form method, two different versions of a test on the same material are given to the same test takers, and the scores are correlated. The SAT given on Saturday is different from the SAT given on Sunday in October; there are different questions on each form. Although this does not happen, if the same people took both exams and the tests were highly reliable, the scores should be the same on both tests. This would also necessitate high interrater reliability, the extent to which two or more scorers evaluate the responses in the same way.


Tests can be very reliable, but if they are not also valid, they are useless for measuring the particular construct or behavior. Psychometricians must present data to show that a test measures what it is supposed to measure accurately and that the results can be used to make accurate decisions. Because there are no universal standards against which test scores can be compared, validation is most frequently accomplished by obtaining high correlations between the test and other assessments. Validity is the extent to which an instrument accurately measures or predicts what it is supposed to measure or predict. Just as there are several methods for measuring reliability, there are also several methods for measuring validity.