Reliability
Reliability refers to the consistency in test scores. In psychological testing, scores must demonstrate acceptable levels of consistency in order for them to be meaningful. This chapter presents a variety of methods used to estimate the reliability of scores, along with an overview of how they are calculated, when they can be used, and how they can be interpreted. These methods include test-retest reliability, alternate-form reliability, inter-rater reliability, reliability of composite scores, and reliability of difference scores. Central to the measurement of reliability is measurement error, and the standard error of measurement is reviewed as one method to assess measurement error. Modern test theories, including generalizability theory and item response theory, are introduced. A practical strategy for educators to estimate reliability of classroom test scores is provided, as well as an example of how a commercially available ability test reports reliability information.
It is the user who must take responsibility for determining whether scores are sufficiently trustworthy to justify anticipated uses and interpretations for particular uses.
AERA et al. (2014, p. 41)
This is a preview of subscription content, log in via an institution to check access.
Access this chapter
Subscribe and save
Springer+ Basic
€32.70 /Month
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (France)
eBook EUR 84.99 Price includes VAT (France)
Softcover Book EUR 105.49 Price includes VAT (France)
Hardcover Book EUR 147.69 Price includes VAT (France)
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
- American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association. Google Scholar
- Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. Belmont, CA: Wadsworth. Google Scholar
- Cronbach, L., Rajaratnam, N., & Gieser, G. (1963). Theory of generalizability: A liberalization of reliability theory. British Journal of Statistical Psychology, 16(2), 137–163. ArticleGoogle Scholar
- Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334. ArticleGoogle Scholar
- Cronbach, L. J. (1990). Essentials of psychological testing (5th ed.). New York, NY: HarperCollins. Google Scholar
- Deiderich, P. B. (1973). Short-cut statistics for teacher-made tests. Princeton, NJ: Educational Testing Service. Google Scholar
- Dudek, F. (1979). The continuing misinterpretation of the standard error of measurement. Psychological Bulletin, 86(2), 335–337. ArticleGoogle Scholar
- Ghiselli, E. E., Campbell, J. P., & Zedeck, S. (1981). Measurement theory for the behavioral sciences. San Francisco, CA: W.H. Freeman. Google Scholar
- Glutting, J., McDermott, P., & Stanley, J. (1987). Resolving differences among methods of establishing confidence limits for test scores. Educational and Psychological Measurement, 47(3), 607–614. ArticleGoogle Scholar
- Gronlund, N. E. (2003). Assessment of student achievement (7th ed.). Boston, MA: Allyn & Bacon. Google Scholar
- Guilford, J. (1936). Psychometric methods. New York, NY: McGraw-Hill. Google Scholar
- Gulliksen, H. (1950). Theory of mental tests. New York, NY: Wiley. BookGoogle Scholar
- Hays, W. (1994). Statistics (5th ed.). New York, NY: Harcourt Brace. Google Scholar
- Hoover, H. D., Dunbar, S. B., & Frisbie, D. A. (2001). Iowa test of basic skills. Itasca, IL: Riverside. Google Scholar
- Hopkins, K. D. (1998). Educational and psychological measurement and evaluation (8th ed.). Boston, MA: Allyn & Bacon. Google Scholar
- Kamphaus, R. W. (2005). Clinical assessment of child and adolescent intelligence. New York, NY: Springer. BookGoogle Scholar
- Kaufman, A. S., & Lichtenberger, E. O. (1999). Essentials of WAIS-III assessment. New York, NY: Wiley. Google Scholar
- Keith, T. Z., & Reynolds, C. R. (1990). Measurement and design issues in child assessment research. In C. R. Reynolds & R. W. Kamphaus (Eds.), Handbook of psychological and educational assessment of children: Intelligence and achievement (pp. 29–62). New York, NY: Guilford Press. Google Scholar
- Kuder, G. F., & Richardson, M. W. (1937). The theory of the estimation of reliability. Psychometrika, 2, 151–160. ArticleGoogle Scholar
- Linn, R. L., & Gronlund, N. E. (2000). Measurement and assessment in teaching (8th ed.). Upper Saddle River, NJ: Prentice Hall. Google Scholar
- Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley. Google Scholar
- Magnusson, D. (1967). Test theory. Reading, MA: Addison-Wesley. Google Scholar
- Nunnally, J. C. (1978). Psychometric theory (2nd ed.). New York, NY: McGraw-Hill. Google Scholar
- Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York, NY: McGraw-Hill. Google Scholar
- Osterlind, S. J. (2006). Modern measurement: Theory, principles, and applications of mental appraisal. Upper Saddle River, NJ: Pearson. Google Scholar
- Reynolds, C. R. (1999). Inferring causality from relational data and design: Historical and contemporary lessons for research and clinical practice. The Clinical Neuropsychologist, 13, 386–395. ArticlePubMedGoogle Scholar
- Reynolds, C. R., & Kamphaus, R. W. (2015). Reynolds Intellectual Assessment Scales (2nd ed.). Lutz, FL: Psychological Assessment Resources. Google Scholar
- Roid, G. H. (2003). Stanford-Binet Intelligence Scale (5th ed.). Itasca, IL: Riverside. Google Scholar
- Saupe, J. L. (1961). Some useful estimates of the Kuder-Richardson formula number 20 reliability coefficient. Educational and Psychological Measurement, 2, 63–72. ArticleGoogle Scholar
- Sireci, S. G., Thissen, D., & Wainer, H. (1991). On the reliability of testlet-based tests. Journal of Educational Measurement, 28, 237–247. ArticleGoogle Scholar
- Spearman, C. (1907). Demonstration of formulae for true measurement of correlation. The American Journal of Psychology, 18(2), 161–169. ArticleGoogle Scholar
- Spearman, C. (1913). Correlations of sums or differences. British Journal of Psychology, 5, 417–426. Google Scholar
- Thorndike, R. (1949). Personnel selection: Test and measurement techniques. Oxford, England: Wiley. Google Scholar
- Thurstone, L. (1931). The reliability and validity of tests: Derivation and interpretation of fundamental formulae concerned with reliability and validity of tests and illustrative problems. Ann Arbor, MI: Edwards Brothers. Google Scholar
- Wechsler, D. (1997). Wechsler adult intelligence scale (3rd ed.). San Antonio, TX: Psychological Corporation. Google Scholar
- Wilkinson, L., & Task Force on Statistical Inferences. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54(8), 594–604. ArticleGoogle Scholar
Recommended Reading
- American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: AERA. Chapter 2: Reliability/Precision and Errors of Measurement is a great resource! Google Scholar
- Feldt, L. S., & Brennan, R. L. (1989). Reliability. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 105–146). Upper Saddle River, NJ: Merrill/Prentice Hall. A little technical at times, but a great resource for students wanting to learn more about reliability. Google Scholar
- Ghiselli, E. E., Campbell, J. P., & Zedeck, S. (1981). Measurement theory for the behavioral sciences. San Francisco, CA: Freeman. Chapters 8 and 9 provide outstanding discussions of reliability. A classic! Google Scholar
- Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York, NY: McGraw-Hill. Chapter 6: The Theory of Measurement Error and Chapter 7: The Assessment of Reliability are outstanding chapters! Another classic! Google Scholar
- Subkoviak, M. J. (1984). Estimating the reliability of mastery-nonmastery classifications. In R. A. Berk (Ed.), A guide to criterion-referenced test construction (pp. 267–291). Baltimore, MD: Johns Hopkins University Press. An excellent discussion of techniques for estimating the consistency of classification with mastery tests. Google Scholar
Author information
Authors and Affiliations
- Austin, TX, USA Cecil R. Reynolds
- Minneapolis, MN, USA Robert A. Altmann
- Department of Psychology, University of Nevada, Las Vegas, Las Vegas, NV, USA Daniel N. Allen
- Cecil R. Reynolds