Bio


Dr. Haertel is an expert in the area of educational testing and assessment. His research and teaching focus on psychometrics and educational policy, especially test-based accountability and related policy uses of test data. His recent work has examined standard setting methods, limitations of value-added models for teacher and school accountability, impacts of testing on curriculum, students, and educational policy, test reliability, and generalizability theory.

Academic Appointments


Administrative Appointments


  • Jacks Family Professor of Education, Emeritus, Stanford Graduate School of Education (2013 - Present)
  • Jacks Family Professor of Education, Stanford Graduate School of Education (2008 - 2012)
  • Associate Dean for Faculty Affairs, Stanford Graduate School of Education (2005 - 2010)
  • Professor of Education, Stanford Graduate School of Education (1992 - 2008)
  • Associate Professor of Education, Stanford Graduate School of Education (1987 - 1992)
  • Assistant Professor of Education, Stanford Graduate School of Education (1980 - 1987)

Boards, Advisory Committees, Professional Organizations


  • Member, Technical Design Group, California Department of Education, Assessment and Accountability Unit (2015 - Present)
  • Member, Smarter Balanced Assessment Consortium Technical Advisory Committee (2019 - Present)
  • Assistant Professor, University of Illinois, Chicago (1979 - 1980)

Professional Education


  • PhD, University of Chicago, Measurement, Evaluation and Statistical Analysis (1980)
  • BA, University of Wisconsin-Madison, Mathematics (1971)

Research Interests


  • Assessment, Testing and Measurement
  • International and Comparative Education
  • School Reform
  • Standards
  • Teachers and Teaching

Current Research and Scholarly Interests


Functions of test scores in discourse about education; how testing shapes ideas of success and failure for students, schools, and public education as a whole.

2023-24 Courses


All Publications


  • Comparability of Large-Scale Educational Assessments: Issues and Recommendations. edited by Berman, A. I., Haertel, E. H., Pellegrino, J. W. National Academy of Education. 2020

    View details for DOI 10.31094/2020/1

  • The Testing Charade: Pretending to Make Schools Better (Book Review) AMERICAN JOURNAL OF EDUCATION Book Review Authored by: Haertel, E. H. 2018; 124 (3): 373–77
  • Measuring Cultural Dimensions of Classroom Interactions EDUCATIONAL ASSESSMENT Jensen, B., Grajeda, S., Haertel, E. 2018; 23 (4): 250–76
  • Tests, Test Scores, and Constructs EDUCATIONAL PSYCHOLOGIST Haertel, E. H. 2018; 53 (3): 203–16
  • Fairness using derived scores Fairness in Educational Assessment and Measurement Haertel, E., Ho, A. Routledge. 2016: 233–254
  • Engaging methodological pluralism Handbook of research on teaching Moss, P. A., Haertel, E. H. 2016: 127-247
  • Selection of Common Items as an Unrecognized Source of Variability in Test Equating: A Bootstrap Approximation Assuming Random Sampling of Common Items APPLIED MEASUREMENT IN EDUCATION Michaelides, M. P., Haertel, E. H. 2014; 27 (1): 46-57
  • Selection of common items as an unrecognized source of variability in test equating: A bootstrap approximation assuming random sampling of common items Applied Measurement in Education Michaelides, M. P., Haertel, E. H. 2014; 27 (1): 46-57
  • Getting the Help We Need JOURNAL OF EDUCATIONAL MEASUREMENT Haertel, E. 2013; 50 (1): 84-90

    View details for DOI 10.1111/jedm.12002

    View details for Web of Science ID 000316286300003

  • Reliability and Validity of Inferences about Teachers Based on Student Scores. William H. Angoff Memorial Lecture Series. Educational Testing Service Haertel, E. H. 2013
  • Improving ability measurement in surveys by following the principles of IRT: The Wordsum vocabulary test in the General Social Survey SOCIAL SCIENCE RESEARCH Cor, M. K., Haertel, E., Krosnick, J. A., Malhotra, N. 2012; 41 (5): 1003-1016

    Abstract

    Survey researchers often administer batteries of questions to measure respondents' abilities, but these batteries are not always designed in keeping with the principles of optimal test construction. This paper illustrates one instance in which following these principles can improve a measurement tool used widely in the social and behavioral sciences: the GSS's vocabulary test called "Wordsum". This ten-item test is composed of very difficult items and very easy items, and item response theory (IRT) suggests that the omission of moderately difficult items is likely to have handicapped Wordsum's effectiveness. Analyses of data from national samples of thousands of American adults show that after adding four moderately difficult items to create a 14-item battery, "Wordsumplus" (1) outperformed the original battery in terms of quality indicators suggested by classical test theory; (2) reduced the standard error of IRT ability estimates in the middle of the latent ability dimension; and (3) exhibited higher concurrent validity. These findings show how to improve Wordsum and suggest that analysts should use a score based on all 14 items instead of using the summary score provided by the GSS, which is based on only the original 10 items. These results also show more generally how surveys measuring abilities (and other constructs) can benefit from careful application of insights from the contemporary educational testing literature.

    View details for DOI 10.1016/j.ssresearch.2012.05.007

    View details for Web of Science ID 000306620600001

    View details for PubMedID 23017913

  • Evaluating teacher evaluation PHI DELTA KAPPAN Darling-Hammond, L., Amrein-Beardsley, A., Haertel, E., Rothstein, J. 2012; 93 (6): 8-15
  • The briefing book method Setting performance standards: Foundations, methods, and innovations Haertel, E. H., Beimers, J. N., Miles, J. A. 2012: 283-299
  • Evaluating teacher evaluation Phi Delta Kappan Darling-Hammond, L., Amrein-Beardsley, A., Haertel, E., Rothstein, J. 2012; 93 (6): 8-15
  • The Effect of Ignoring Classroom‐Level Variance in Estimating the Generalizability of School Mean Scores Educational Measurement: Issues and Practice Wei, X., Haertel, E. 2011; 30 (1): 13-22
  • Medicine on a need-to-know basis NATURE IMMUNOLOGY Busch, R., Byrne, B., Gandrud, L., Sears, D., Meyer, E., Kattah, M., Kurihara, C., Haertel, E., Parnes, J. R., Mellins, E. D. 2006; 7 (6): 543-547

    Abstract

    Disease-oriented, introductory medical curricula can help overcome educational and institutional barriers that separate aspiring translational scientists in PhD programs from the world of medicine.

    View details for Web of Science ID 000237751200004

    View details for PubMedID 16715061

  • The effects of content, format, and inquiry level on science performance assessment scores APPLIED MEASUREMENT IN EDUCATION Stecher, B. M., Klein, S. P., Solano-Flores, G., McCaffrey, D., Robyn, A., SHAVELSON, R. J., HAERTEL, E. 2000; 13 (2): 139-160
  • Performance assessment and education reform PHI DELTA KAPPAN Haertel, E. H. 1999; 80 (9): 662-666
  • Gender and racial/ethnic differences on performance assessments in science EDUCATIONAL EVALUATION AND POLICY ANALYSIS Klein, S. P., Jovanovic, J., Stecher, B. M., McCaffrey, D., SHAVELSON, R. J., HAERTEL, E., SOLANOFLORES, G., Comfort, K. 1997; 19 (2): 83-97
  • Generalizability analysis for performance assessments of student achievement or school effectiveness EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT Cronbach, L. J., LINN, R. L., Brennan, R. L., Haertel, E. H. 1997; 57 (3): 373-399
  • COMPONENTS OF INTERESTING SCIENCE EXPERIMENTS SCIENCE EDUCATION Martinez, M. E., HAERTEL, E. 1991; 75 (4): 471-479
  • I NEVER PROMISED YOU 1ST PLACE - A REJOINDER PHI DELTA KAPPAN BRADBURN, N., HAERTEL, E., Schwille, J., TORNEYPURTA, J. 1991; 72 (10): 774-777
  • CONTINUOUS AND DISCRETE LATENT STRUCTURE MODELS FOR ITEM RESPONSE DATA PSYCHOMETRIKA Haertel, E. H. 1990; 55 (3): 477-494
  • USING RESTRICTED LATENT CLASS MODELS TO MAP THE SKILL STRUCTURE OF ACHIEVEMENT ITEMS JOURNAL OF EDUCATIONAL MEASUREMENT Haertel, E. H. 1989; 26 (4): 301-321
  • BUYERS BEWARE - THE DECEPTIVELY HIGH COST OF LISREL COUNSELING PSYCHOLOGIST Haertel, E. H., Thoresen, C. E. 1987; 15 (2): 316-319
  • MEASURING SCHOOL PERFORMANCE TO IMPROVE SCHOOL PRACTICE EDUCATION AND URBAN SOCIETY HAERTEL, E. 1986; 18 (3): 312-325
  • CONSTRUCT-VALIDITY AND CRITERION-REFERENCED TESTING REVIEW OF EDUCATIONAL RESEARCH HAERTEL, E. 1985; 55 (1): 23-46
  • DETECTION OF A SKILL DICHOTOMY USING STANDARDIZED ACHIEVEMENT-TEST ITEMS JOURNAL OF EDUCATIONAL MEASUREMENT HAERTEL, E. 1984; 21 (1): 59-72
  • AN APPLICATION OF LATENT CLASS MODELS TO ASSESSMENT DATA APPLIED PSYCHOLOGICAL MEASUREMENT HAERTEL, E. 1984; 8 (3): 333-346
  • SCHOOL-ACHIEVEMENT - THINKING ABOUT WHAT TO TEST JOURNAL OF EDUCATIONAL MEASUREMENT HAERTEL, E., Calfee, R. 1983; 20 (2): 119-132
  • THE IMPACT OF LEISURE-TIME TELEVISION ON SCHOOL LEARNING - A RESEARCH SYNTHESIS AMERICAN EDUCATIONAL RESEARCH JOURNAL Williams, P. A., Haertel, E. H., HAERTEL, G. D., WALBERG, H. J. 1982; 19 (1): 19-50