Document Type : Research Paper

Authors

1 university of tehran psychology and education faculity

2 faculty of psychology and education university of tehran

3 university of allame faculty of psychology and education

4 university of saskatchewan faculty of education

Abstract

In order to the better interpreting and comparing scores in test batteries the raw scores in each test are converted to a common scale that called scale score. There are different Linear and nonlinear methods to convert raw scores to scale scores. Conventional methods of non-linear converting raw scores to scale scores are normalization and Arcsine methods. In This study that aimed to compare the standard error of measurement in non-linear converting methods we used 10000 random simulated sample data and 10000 random real sample data from Iran university entrance exam applicants.in order to compare converting methods conditional standard error of measurement that called CSEM, frequency charts and statistical indexes like moments was used. The results showed that both methods have different features. Although scores in both methods have high reliability and accuracy but Arcsine method reducing score error undulation for different score levels, also the mean of standard error of measurement for Arcsine scale scores was less than normalized scale scores.

Keywords

سازمان سنجش آموزش کشور (1395). کارنامه آماری آزمون سراسری سال 1395. تهران: انتشارات سازمان سنجش آموزش کشور (دفتر طرح و آمار)
نقی زاده، سیما (1394). نمره کل سازی آزمون سراسری در گروه آزمایشی علوم ریاضی و فنی سال 1391 بر اساس توزیع واقعی نمرات و مقایسه آن با روش فعلی. تهران: مرکز تحقیقات ارزشیابی، اعتبار سنجی و تضمین کیفیت آموزش عالی (سازمان سنجش آموزش کشور).
Allen, M. J., & Wendy, Y. M. (1979). Introduction to Measurement Theory. California: Cole publishing company.
Angoff, W.H. (1971). Scales, norms, and equivalent scores. In RL. Thorndike (Ed.).
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, (2014). Standards for educational and psychological testing. Washington, DC: APA
Brennan, R. L., & Lee, W. C. (1999). Conditional scale-score standard errors of measurement under binomial and compound binomial assumptions. Educational and Psychological Measurement, 59(1), 5-24.
Brooks, G. P., & Johnson, G. A. (2003). TAP: Test Analysis Program. Applied Psychological Measurement. 27(4), 303-304.
Brooks, G. P., & Johnson, G. A. (2014). TAP: Test Analysis Program version (14.7.4) [computer software]. Retrieved from
http://www.ohio.edu/people/brooksg/software.htm.
Chang, S. W. (2006). Methods in Scaling the Basic Competence Test. Educational and Psychological Measurement, 66(6), 907-929.
Dorans N. J., Pommerich, M. & Holland P. W. (2007). A Framework and History for Score Linking. In Holland P. W. (Eds.), Linking and Aligning Scores and Scales (pp 5-30). New York: Springer.
Feldt, L. S., & Brennan, R. L. (1989). Reliability. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 105-146). New York, NY: Macmillan.
Feldt, L. S., & Quails, A. L. (1996). Estimation of measurement error variance at specific score levels. Journal of Educational Measurement, 33, 141-156. 156.
Gulliksen, H. (1950). Theory of mental test. New York: John Wiley & sons.
Haertel, H. E. (2006). Reliability. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 65-86). CT: American Council on Education and Praeger.
Iowa Assessment (2016). Iowa Test of Basic Skills. Iowa City: Author Retrieved: itp.education.uiowa.edu
Kolen, M. J., Hanson, B. A., & Brennan, R. L. (1992). Conditional standard errors of measurement of scale scores. Journal of Educational Measurement, 29, 285-307.
Kolen, M. J., & Hanson, B. A. (1989). Scaling the ACT Assessment. In R. L. Brennan (Ed.), Methodology used in scaling the ACT Assessment and P-ACT+ (pp. 35-55). Iowa City, IA: American College Testing Program.
Kolen, M. J. (1991). Smoothing methods for estimating test score distributions. Journal of Educational Measurement, 28, 257-282.
Kolen, M. J., & Brennan, R. L. (2004). Test Equating, Scaling and Linking (2rd Ed.). New York: Springer.
Kolen, M. J, Wang, T., Lee, W. Chon. (2012). Conditional Standard Errors of Measurement for Composite Scores Using IRT. International Journal of Testing, 12, 1-20.
Kolen, M. J., & Brennan, R. L. (2014). Test Equating, Scaling and Linking, 3rd Ed. New York: Springer.
Lee, W. C., Brennan, R. L. & Kolen, M. J. (2000), Estimators of Conditional Scale-Score Standard Errors of Measurement: A Simulation Study. Journal of Educational Measurement, 37, 1–20. 
Lord, F. M. (1965). A strong true-score theory with applications. Psychometrika, 30, 239-270.
Lord, F. M., & Novick, M. R. (1968). Statistical theory of mental test scores. MA: Adisson-wesley.
Lord, F. M. (1969). Estimating true-score distributions in psychological testing (An empirical Bayes estimation problem). Psychometrika, 34, 259-299.
Mood, M. A., Gray bill, A. F. & Boes, C. D. (2008). Introduction to the Theory of Statistics. C.A: McGraw-Hill.
Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scaling, norming, and equating. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 221-262). New York: American Council Education; and Macmillan.
The SAT. (2016). SAT technical manual.New York: Author. Retrieved from
collegereadiness.collegeboard.org.
The ACT. (2014). ACT assessment technical manual. Iowa City: Author. Retrieved from http://www.act.org/research/researchers/techmanuals.html
Woodruff, D., Traynor, A., Cui, Z., & Fang, Y. (2013). A Comparison of Three Methods for Computing Scale Score Conditional Standard Errors of Measurement. ACT Research Report Series, 2013 (7). ACT, Inc.