Document Type : Research Paper

Abstract

Abstract

Test of common types of assessments that are done in the education system. Test results should be invoked for any of the validity, reliability, and has the ability to run each test covers a different aspect of. Poorly made tool that is not only not useful, but dangerous as well. self test mechanism so carefully constructed, run and score can be read. To ensure fairness test scores from different test forms with methods that are commonly referred to as equating is adjusted. Alignment is commonly used as a statistical method for matching test scores to account for differences between different forms of unwanted application form so that scores are comparable. National Education Assessment is described and appropriate manner. The purpose of this study is to Anchor matched groups design and plans for the disparate groups with anchor test and linear equating methods, mean Equipercentile the classical test theory and compare it with the results of the new theory for measuring equating, the equating position measurement system training in the proper manner must be presented and explained.

Keywords

آلن، مری جی، ویدنی ام (1375) مقدمه ای بر نظریه های اندازه گیری (ترجمه علی دلاور) تهران: سمت
دلاور، علی (1388). احتمالات و آمار کاربردی در روانشناسی و علوم تربیتی، تهران: انتشارات رشد
دلاور، علی- روش تحقیق در روان شناسی و علوم تربیتی . تهران: ویرایش 1384
ثرندایک ، رابرت ال (1375) روانسنجی کاربردی (ترجمه حیدر علی هومن) تهران: انتشارات دانشگاه تهران (تاریخ انتشار به زبان اصلی 1982).
سیف، علی اکبر (1385). سنجش، اندازه گیری و ارزشیابی. تهران نشر دوران
شریفی، حسن پاشا (1384). اصول روانسنجی و روان آزمایی. تهران: انتشارات رشد
همبلتون ، رونالد.ک، سوامیناتان.اچ و راجرز، اچ. جین(1991).( ترجمه محمدرضا فلسفی‏نژاد، 1389)تهران: انتشارات دانشگاه علامه طباطبایی
هومن، حیدر علی (1380). اندازه گیری های روانی و تربیتی و فن تهیه تست. تهران: نشر پارسا
Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.) Educational measurement (2nd ed., pp. 508-600). Washington, DC: American Educational Research Association.
Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike, Educational Measurement (2nd ed., pp. 508-600). Washington, D. C.: American Council of Education.
Braun, H. I. & Holland, P. W. (1982). Observed-score test equating: A mathematical analysis of some ETS equating procedures. In P. W. Holland, & D. B. Rubin, Test equating (pp. 9 - 49). New York: Academic Press.
Brennan, R. L. (2010). Assumptions about true-scores and populations in equating. Measurement: Interdisciplinary Research and Perspectives, 8(1), 1-3.
Cui, Z., & Kolen, M. J. (2008). Comparison of parametric and nonparametric bootstrap methods for estimating random error in equipercentile equating. Applied Psychological Measurement, 32(4), 334-347.
Dorans, N. J. (1990). Equating methods and sampling designs. Applied Measurement in Education, 3, 3-17.Educational Measurement, 31, 113-123.
Hambleton, R. K. & Swaminathan, H. (1985). Item response theory: principles and applications. Boston, MA: Kluwer.
Hambleton, R., Swaminathan, H., & Rogers, H. (1991). Fundamentals of item response theory. Newbury Park, CA: SAGE Publications, Inc.
Hanson, B. (2004, May 18). Equating Error. (Z. Cui, Ed.) Iowa City, IA, US: CASMA.
Hanson, B. A., & Beguin, A. A. (2002). Obtaining a common scale for item response theory item parameters using separate versus concurrent estimation in the common-item equating design. Applied Psychological Measurement, 26(1), 2-24.
Hanson, B. A., & Zeng, L. (revised by Cui, Z.) (2004). PIE. A computer program for IRT equating.
Harris, D. J., & Kolen, M. J. (1990). A comparison of two equipercentile equating methods for common item equating. Educational and Psychological Measurement, 50, 61-71.
Holland, P. W., & Dorans, N. J. (2006). Linking and equating. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 187-220). Westport, CT: Praeger.
Holland, P. W., & Thayer, D. T. (2000). Univariate and bivariate loglinear models for discrete test score distributions. Journal of Educational and Behavioral Statistics, 25(2), 133-183.
Holland, P. W., Sinharay, S., von Davier, A. A., & Han, N. (2008). An approach to evaluating the missing data assumptions of the chain and post-stratification equating methods for the NEAT design. Journal of Educational Measurement, 45(1), 17-43.
Kendall, M., Stuart, A., & Ord, J. K. (1994). Kendall's Advanced Theory of Statistics, volume 2: Distribution Theory (6th ed.). A Hodder Arnold Publication.
Kim, D. I., Brennan, R. L., & Kolen, M. J. (2005). A comparison of IRT equating and beta 4 equating. Journal of educational measurement , 42 (1), 77-99.
Kim, S., von Davier, A. A., & Haberman, S. (2008). Small-sample equating using a synthetic linking function. Journal of Educational Measurement, 45(4), 325-342.
Kim, S., Walker, M. E., & McHale, F. (2010). Comparisons among designs for equating mixedformat tests in large-scale assessments. Journal of Educational Measurement, 47(1), 36-53.
Klein, L. W., & Jarjoura, D. (1985). The importance of content representation for common-item equating with nonrandom groups. Journal of Educational Measurement, 22(3), 197-206.
Kolen, M. J. & Brennan, R. L. (2004). Test equating, scaling and linking: Methods and practices. (2nd, Ed.) New York, NY: Springer-Verlag.
Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and
Kolen, M. J., & Harris, D. J. (1990). Comparison of item preequating and random groups equating using IRT and equipercentile methods. Journal of Educational Measurement , 27 (1), 27-39. 218
Kolen, M. J., Hanson, B. A, & Brennan, R. L. (1992). Conditional standard errors of measurement for scale scores. Journal of Educational Measurement, 29, 285-307.
Livingston, S. A., & Kim, S. (2010). Random-groups equating with samples of 50 to 400 test takers. Journal of Educational Measurement, 47(2), 175-185.
Livingston, S. A.,Dorans, N. J., & Wright, N. K. (1990). What combination of sampling and equating methods works best? Applied Measurement in Education, 3, 73-95.
Lord, F. M. & Wingersky, M. S. (1984). Comparision of IRT-true-score and equipercentile observed-score "Equatings.". Applied Psychological Measurement , 8, 453 - 461.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test score. Menlo Park, CA: Addison-Wesley.
Lord, F. M., & Wingersky, M. S. (1984). Comparison of IRT true-score and equipercentile observed-score “equatings.” Applied Psychological Measurement, 8(4), 453-461.
Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scaling, norming, and equating. In R. L. Linn (Ed.), Educational Measurement (3rd ed., pp.221-262). New York: Macmillan.
Philips, S. E. (1985). Quantifying equating errors with item response theory methods. Applied 127 Psychological Measurement, 9(1), 59-71.
Reckase, M. D. (2009). Multidimensional item response theory. New York: Springer.
Rosenbaum, P. R., & Thayer, D. (1987). Smoothing the joint and marginal distributions of scored two-way contingency tables in test equating . British Journal of Mathematical and Statistical Psychology, 40, 43-49.
Sinharay, S., & Holland, P. W. (2007). Is it necessary to make anchor tests mini-versions of the tests being equated or can some restrictions be relaxed? Journal of Educational Measurement, 44(3), 249-275.
Tong, Y. & Kolen, M. J. (2005). Assessing equating results on different equating criteria. Applied Psychological Measurement , 29 (6), 418-432.
Tong, Y., & Kolen, M. J. (2005). Assessing equating results on different equating criteria. Applied Psychological Measurement, 29(6), 418-432. 128
van der Linden, w. J. (2005). Linear models for optimal test design. New York, NY: Springer-Verlag. 219
van der Linden, W. J. (2006a). Equating error in observed-score equating. Applied Psychological Measurement , 30 (5), 355-378.
van der Linden, W. J. (2010). Local observed-score equating. In A. A. von Davier (Ed.) Statistical models for equating. New York: Springer.
van der Linden, W. J., & Wiberg, M. (2010). Local observed-score equating with anchor-test designs. Applied Psychological Measurement, 34(8), 620-640.
von Davier, A. A., Holland, P. W., Livingston, S. A., Casabianca, J., Grant, M. C., & Martin, K. (2006). An evaluation of the kernel equating method: A special study with pseudotests constructed from real test data. ETS Research Report. Princeton, NJ: Educational Testing Service.
Wang, T., Hanson, B. A., & Harris, D. J. (2000). The effectiveness of circular equating as a criterion for evaluating equating. Applied Psychological Measurement, 24(3), 195-210.
Wang, T., Lee, W-C., Brennan, R. L., & Kolen, M. J. (2008). A Comparison of the frequency estimation and chained equipercentile methods under the common-item nonequivalent groups design. Applied Psychological Measurement, 32(8), 632-651.
Zeng, L. & Hanson, B. (2005, Oct. 5). RAGE-RGEQUATE. (Z. Cui, Ed.) Iowa City, IA, US: CASMA.
Zimowski, M. F., Muraki, E., Mislevy, R. J., & Bock, R. D. (1996). BILOG-MG: Multiple-group IRT analysis and test maintenance for binary items [Computer software and manual]. Chicago: Scientific Software International.