تلفیق رویکرد ریکیسی و رویکرد برنامه‌نویسی ریاضی در طراحی خزانه‌های سؤال بهینه برای سنجش انطباقی کامپیوتری

نوع مقاله: مقاله پژوهشی

نویسنده

دانشگاه خوارزمی

چکیده

سنجش انطباقی کامپیوتری (CAT) شیوه‌ای از سنجش توانایی است که دقت برآورد توانایی را افزایش می‌دهد و بدون از دست دادن دقت اندازه‌گیری آزمون، طول آن را کاهش می‌دهد. با این وجود، سنجش انطباقی در صورتی خوب عمل می‌کند که، دارای خزانه سؤالی باشد که در آن تعداد کافی سؤال با کیفیت مناسب وجود داشته باشد. بسیاری از محققان خاطر نشان کردند که، برای ساخت خزانه سؤالی برای (CAT)، نه ‌تنها اندازه خزانه سؤال مهم است، بلکه توزیع پارامترهای سؤال‌های خزانه نیز از اهمیت به‌سزایی برخوردار است. با این‌وجود، تحقیقات اندکی در مورد این‌که چگونه این ویژگی‌های مطلوب تعیین می‌شود، وجود دارد. هدف اصلی این پژوهش، تلفیق ایده“bin-and-union” برگرفته از رویکرد ریکیسی (2003)، که یک روش شبیه‌سازی مونت‌کارلو برای تعیین ویژگی‌های خزانه سؤال است، با رویکرد برنامه‌نویسی ریاضی بوده است. در این پژوهش این روش برای ساخت یک خزانه سؤال بهینه برای آزمون سنجش انطباقی ریاضی به‌کار رفته است. خزانه سؤال پژوهش حاضر بر اساس مدل سه پارامتری مدرج شده است، و روش سیمپسون-هتر برای کنترل مواجهه سؤال به کار رفته است. این طرح شامل برآوردهایی از اندازه مطلوب خزانه سؤال و توزیع مطلوب پارامترهای سؤال‌ها و ویژگی‌های غیر آماری آن بوده است. فرآیند طراحی این خزانه شامل تعیین مجموعه‌ای از ویژگی‌های مطلوب خزانه سؤال، با درنظر گرفتن چندین عامل مهمی که ممکن بود بر نتایج مورد‌ نظر طراحی یک خزانه سؤال اثر گذارد، بوده ‌است. عملکرد خزانه‌های‌ سؤال شبیه‌سازی شده و عملیاتی با در نظر گرفتن مجموعه‌ای از ملاک‌های ارزیابی، با یکدیگر مورد مقایسه قرار گرفته‌‌اند. نتایج ارزیابی نشان داد که، مکانیزم به کار رفته برای تعیین ویژگی‌های مطلوب خزانه سؤال به‌خوبی عمل می‌کند و برای تعیین ویژگی های مطلوب خزانه سؤال مناسب است.

کلیدواژه‌ها


عنوان مقاله [English]

Compilation Reckase’s Method and Mathematical Programming Method in Designing Optimal Item Pool for Computerized Adaptive Tests

چکیده [English]

Computerized adaptive testing (CAT) is a testing procedure that can result in improved precision for a specified test length or reduced test length with no loss of precision. But, for computerized adaptive tests (CATs) to work well, they must have an item pool with sufficient numbers of good quality items. Many researchers have pointed out that, in developing item pool for CATs, not only is the item pool size is important, but also the distribution of item parameters. Yet, there is little research on how to identify those desirable features. This paper applied and extended the basic idea of the “bin-and-union” method proposed by Reckase (2003),- which is a Monte Carlo method to determine the properties of an optimal item pool-, and mathematical programming method to develop the optimal item pool for a mathematic operational CAT. This study extended the method for designing item pools calibrated with the three-parameter logistic model and applied it to situations where the Sympson-Hetter procedure is used to control the item exposure rate. The designs include estimates of desired item pool size and item parameter distribution. The design process includes identifying a series of candidate item pool features by taking into consideration multiple factors that may affect the desired features of the item pool. The performance of the simulated item pools has been compared with operational item pool by considering some evaluation criteria. The result of evaluation indicated that the mechanism used to identify the desirable item pool features has functioned well and appropriate for identifying a desirable item pool features of mathematic operational CAT.

کلیدواژه‌ها [English]

  • optimal item pool
  • computerized adaptive testing
  • Reckase’s Method and Mathematical Programming Method
  • weighted deviations model
Brooke, A., Kendrick, D., & Meeraus, A. (1988). GAMS: A user’s guide. Redwood City CA: The Scientific Press.
Chang, H. (2007). Book review: Linear models for optimal test design. Psychometrika, 72, 279-281.
Chang, H. H., & Ying, Z. (1999). Alpha-stratified multistage computerized adaptive testing. Applied Psychological Measurement, 23, 211-222.
Chang, H. H., & van der Linden, W. J. (2003). Optimal stratification of item pools in a-stratified computerized adaptive testing. Applied Psychological Measurement, 27, 262-274.
Cheng, Y., & Chang, H. (2009). The maximum priority index method for severely con- strained item selection in computerized adaptive testing. British Journal of Mathematical and Statistical Psychology, 62, 369-383.
Chen, S. Y., Ankenmann, R. D., & Spray, J. A. (1999). Exploring the relationship between item exposure rate and test overlap rate in computerized adaptive testing (No. ACT-RR-99-5): American College Testing Program, Iowa City, IA.
De Ayala, R.J. (2009). The theory and practice of item response theory. New York: Guilford Press.
Flaugher, R. (2000). Item pools. In H. Wainer (Ed.), Computerized adaptive testing: A primer (pp. 37-59). Mahwah, NJ: Lawrence Erlbaum.
Gu, L. (2007). Designing optimal item pools for computerized adaptive tests with exposure controls. Unpublished doctoral dissertation. Michigan State University.
Gu, L. & Reckase, M. D. (2007). Designing optimal item pools for computerized adaptive tests with Sympson-Hetter exposure control.  Paper Presented at the 2007 GMAC Conference on Computerized Adaptive Testing, Minneapolis, MN.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park CA: Sage.
Hau, K. T., & Chang, H. H. (2001). Item selection in computerized adaptive testing: Should more discriminating items be used first. Journal of Educational Measurement, 38 (3), 249-266.
He, W., & Reckase, M.  (2010). Optimal item pool design for a highly constrained computerized adaptive test. Unpublished doctoral dissertaion. Michigan State University.
He. W., & Reckase, M. (2011). Optimal item pool design for a highly constrained computerized adaptive test. Paper presented at the National Council on Measurement in Education, Denver, CO.
Jensema, C. J. (1972). An application of latent trait mental test theory to the Washington Pre-College Testing Program. Unpublished doctoral dissertation. University of Washington, 1972.
Jensema, C. J. (1977). Bayesian tailored testing and the influence of item bank characteristics. Applied Psychological Measurement, 1, 111-120.
Lord, F. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
McBride, J. R., & Weiss, D. J. (1976). Some properties of a Bayesian adaptive ability testing strategy (Research Rep No. 76-1). Minneapolis, MN: Psychometric Methods Program, Department of Psychology.
Owen, R. J. (1975). A Bayesian sequential procedure for quantal response in the context of adaptive mental testing. Journal of the American Statistical Association, 70, 351-356.
Reckase, M. D. (1974). An application of the Rasch simple logistic model to tailored testing. Paper presented at the Annual Meeting of the American Educational Research Association, Chicago, Illinois.
Reckase, M. D. (1976). The effect of item pool characteristics on the operation of a tailored testing procedure. Paper presented at the spring meeting of the Psychometric Society, Murray Hill, NJ.
Reckase, M.D. (1989). Adaptive testing: The evolution of a good idea. Educational Measurement: Issues and Practice, 8(3), 11-15.
Reckase, M. D.  (2001, September).  Item pool design for computerized adaptive tests.  Invited small group session at the 6th Conference of the European Association of Psychological Assessment, Aachen, Germany.
Reckase, M. D. (2003). Item pool design for computerized adaptive tests. Paper presented at the National Council on Measurement in Education, Chicago, IL.
Reckase, M. D., & He, W. (2004). The ideal item pool for the NCLEX-RN examination— Report to NCSBN: Michigan State University.
Reckase, M. D., & He, W. (2005). Ideal item pool design for the NCLEX-RN exam. Michigan State University, East Lansing, MI.
Reckase, M. D. (2009). Optimal Item Pool Design for the 2009 NCLEX Exam. A Report SubMTIted to National Council of State Boards of Nursing March 2009.
Reckase, M. D., & He, W. (2009a). Optimal item pool design for the 2009 NCLEX Exam-report to the National Council of State Boards of Nursing (NCSBN): Michigan State University.
Reckase, M. D., & He, W. (2009b). The influence of item pool quality on the functioning of computerized adaptive tests. Paper presented at the annual meeting of Psychometric Society, Cambridge, U.K.
Reckase, M. D. (2010). Designing Item Pools to Optimize the Functioning of Computerized Adaptive Test. Psychological Test and Assessment Modeling, 52, 2010 (2), 127-141.
Robin, F., van der Linden, W. J., Eignor, D. R., Steffen, M., & Stocking, M. L. (2005). A comparison of two procedures for constrained adaptive test construction (ETS Research Rep No. RR-04-39). Princeton, NJ: Educational Testing Service.
Stocking, M. L., & Swanson, L. (1993). A method for severely constrained item selection in adaptive testing. Applied Psychological Measurement, 17, 277-292.
Stocking, M. L., Swanson, L., & Pearlman, M. (1993). Application of an automated item selection method to real data. Applied Psychological Measurement, 17, 167–176.
Stocking, M. L. (1994). Three practical issues for modern adaptive testing item pools (No. ETS- RR-94-5): Educational Testing Service, Princeton, NJ.
Sympson, J. B., & Hetter, R. D. (1985, October). Controlling item-exposure rates in computerized adaptive testing. Proceedings of the 27th annual meeting of the Military Testing Association (pp. 973-977). San Diego, CA: Navy Personnel Research and Development Center.
Urry, V. W. (1977). Tailored testing: A successful application of latent trait theory. Journal of Educational Measurement, 14, 181-196.
Van der Linden, W. J. (1998). Optimal assembly of psychological and educational tests. Applied Psychological Measurement, 22, 195-211.
Van der Linden, W. J., & Reese, L. (1998). A model for optimal constrained adaptive testing. Applied Psychological Measurement, 22 (3), 259-270.
Van der Linden, W. J., & Glas, C. A. W. (2000 a). Capitalization on item calibration error in adaptive testing. Applied Measurement in Education, 13(1), 35-53.
Van der Linden, W. J. (2000 b). Constrained adaptive testing with shadow tests. In W. J. van der Linden, & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 27–52). Boston: Kluwer Academic Publishers.
Van der Linden, W. J. (2000 c). Optimal assembly of tests with item sets. Applied Psychological Measurement, 24, 225–240.
Van der Linden, W. J. (2005a). A comparison of item-selection methods for adaptive tests with content constraints. Journal of Educational Measurement, 42, 283-302.
Van der Linden, W. J. (2005b). Linear models for optimal test design. New York: Springer-Verlag.
Van der Linden, W. J., Adelaide, A., & Veldkamp, B. P. (2006). Assembling a computerized adaptive testing item pool as a set of linear tests. Journal of Educational and Behavioral Statistics, 31(1), 81-100.
Veldkamp, B. P., & van der Linden, W. J. (2000). Designing item pools for computerized adaptive testing. In W. J. van der Linden, & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 149–162). The Netherlands: Kluwer Academic Publishers.
Wise, S., & Kingsbury, G. G. (2000). Practical issues in developing and maintaining a computerized adaptive testing program. Psicologica, 21, 135-155. Retrieved from
Xing, D., & Hambleton, R. K. (2004). Impacts of test design, item quality, and item bank size on the psychometric properties of computer-based credentialing examinations. Educational and Psychological Measurement, 64(1), 5-21.