Document Type : Research Paper

Authors

1 Ph.D. Candidate, Faculty of Psychology and Education, University of Allameh Tabataba, Tehran, Iran.

2 ATU

3 Faculty of Psychology and Education, University of Tehran, Tehran, Iran

4 Allameh Tabataba'i University, Tehran, Iran.

10.22054/jem.2025.83753.3588

Abstract

Conditional Standard Error of Measurement (CSEM), which estimates the standard error of measurement at different score levels, is a critical index for measurement precision and aids in interpreting reported test scores. This study aimed to examine the stability of CSEM using three scaling methods—arcsine score transformation, general variance stabilization (gvs), and cubic transformation—across different test formats (multiple-choice, essay, and mixed). Data were drawn from a standardized test combining multiple-choice and essay questions, with two pseudo-tests designed based on separate formats. Results showed that the stability of CSEM depends on test format and structural features. The arcsine method was most stable for multiple-choice tests and performed well in mixed-format tests. The general variance stabilization (gvs) method excelled in mixed tests, providing the most stable CSEM with the least error across the ability scale. The cubic method also demonstrated better stability in mixed tests. These findings highlight the need to select scaling methods based on test characteristics and evaluation goals.

Keywords