Assessing the optimal method of detecting Differential Item Functioning in  Computerized Adaptive Testing

Sharifi, Negar; Falsafi, Mohammad; Farokhi, Noorali; Jamali, Ehsan

doi:10.22054/jem.2019.11109.1323

Document Type : Research Paper

Authors

¹ ATU

² sanjesh

https://doi.org/10.22054/jem.2019.11109.1323

Abstract

Background: Test fairness is one of the main challenges in transition from paper- pencil towards computerized adaptive testing (CAT). Aim: This study was aimed at investigating differential item function (DIF), assessing intervening factors in clarifying DIF and suggesting the optimal method for DIF in computerized adaptive testing. Method: The empirical method was applied based on the nature of the study area. Data gathering procedure and manipulating the variables were done using simulation method. The responses of 1000 examinees (reference and focal group with equal 500 numbers) to item bank of 55 dichotomous items were simulated based on 3-parameter logistic model with 20 iterations. Fifteen items were manipulated in terms of DIF type and magnitude and test impact was evaluated based on mean difference of comparison groups. Computerized adaptive test with 30 items was administered via Firestar software package. Analysis was done by logistic regression (LR) and item response theory-likelihood ratio test (IRT-LRT) and the methods were compared based on their power and type I error rate. Results: Type I error rate of likelihood ratio test was less than logistic regression. The power of the methods was influenced by type, magnitude of DIF and test impact. Comparing with logistic regression, Item response theory-likelihood ratio test had more power in detecting uniform DIF for the impact and no-impact conditions and it showed more power by increasing the magnitude of DIF. The two methods showed no difference in assessing non-uniform DIF and both of them were poor. Conclusion: Given the power and type I error rate, likelihood ratio test is an optimal approach in detecting uniform DIF. However, assessing non-uniform DIF requires further investigation.

Keywords

References

افضلی، افشین (1393). تدوین و ارزشیابی مدل تشخیصی شناختی (CDM) ریاضیات پایه اول دبیرستان با استفاده از روش سلسله مراتبی صفات (AHM). پایان‌نامه دکتری: دانشگاه علامه طباطبایی.

ثرندایک، رابرت ال. (1369). روانسنجی کاربردی. ترجمه حیدرعلی هومن. تهران: انتشارات دانشگاه تهران.

شریفی، حسن پاشا (1390). اصول روان سنجی و روان آزمایی. تهران: رشد.

فرهادی، حسین (1381). «نقدی بر آزمونهای کنکور کارشناسی ارشد رشته زبان انگلیسی»، پژوهشهای زبان خارجی، شماره 13، صص 79-106.

کبیری، مسعود (1393). کاربرد سنجش تشخیص به منظور تعیین مهارت های کسب شده علوم تجربی در دانش آموزان سال سوم راهنمایی ایران بر اساس داده های تیمز 2011. پایان نامه دکتری: دانشگاه تهران

محسنپور، مریم (1393). طراحی و ساخت آزمون شناختی- تشخیصی سواد ریاضی کاربردی دانش آموزان سال اول دبیرستان و سنجش اثربخشی بسته آموزشی جبرانی مبتنی بر آن. پایان نامه دکتری: دانشگاه تهران.

مینایی، اصغر (1391). مدل پردازی تشخیصی شناختی (CDM) سؤال های ریاضیات تیمز 2007 در دانش آموزان پایه هشتم ایران با استفاده از مدل یکپارچه با پارامتر پردازی مجدد (RUM) و مقایسه مهارت های ریاضی دانش آموزان دختر و پسر. پایان نامه دکتری: دانشگاه علامه طباطبایی.

مقدم، اعظم (1394). کاربرد مدلهای تشخیصی شناختی بهمنظور تعیین مهارتهای زیربنایی عملکرد داوطلبان در آزمون وروردی زبان انگلیسی عمومی دوره دکتری. پایان نامه دکتری: دانشگاه علامه طباطبایی.

Alderson, J. C. (2005). Assessing reading. Ernst Klett Sprachen.

Bernhardt, E. B. (1993). Reading development in a second language: Theoretical, empirical, & classroom perspectives. Ablex Publishing Corporation, 355 Chestnut St., Norwood, NJ (paperback: ISBN-0-89391-734-6; cloth: ISBN-0-89391-675-7).

Buck, G., Tatsuoka, K., & Kostin, I. (1997). The Subskills of Reading: Rule‐space Analysis of a Multiple‐choice Test of Second Language Reading Comprehension. Language Learning, 47(3), 423-466.

Davier, M. (2005). A general diagnostic model applied to language testing data. ETS Research Report Series, 2005(2).

Jang, E. E. (2009). Cognitive diagnostic assessment of L2 reading comprehension ability: Validity arguments for Fusion Model application to LanguEdge assessment. Language Testing, 26(1), 031-73.‏

Johnson, D. M., & Reynolds, F. A. (1940). A factor analysis of verbal ability. The Psychological Record, 4, 181.

Kasai, M. (1997). Application of the rule space model to the reading comprehension section of the test of English as a foreign language (TOEFL) (Doctoral dissertation, University of Illinois at Urbana-Champaign).

Koda, K. (2005). Insights into second language reading: A cross-linguistic approach. Cambridge University Press.‏

Kunina-Habenicht, O., Rupp, A. A., & Wilhelm, O. (2009). A practical illustration of multidimensional diagnostic skills profiling: Comparing results from confirmatory factor analysis and diagnostic classification models. Studies in Educational Evaluation, 35(2), 64-70.‏

Lee, Y. W., & Sawaki, Y. (2009). Application of three cognitive diagnosis models to ESL reading and listening assessments. Language Assessment Quarterly, 6(3), 239-263.

Leighton, J., & Gierl, M. (Eds.). (2007). Cognitive diagnostic assessment for education: Theory and applications. Cambridge University Press.‏

Li, H. (2011). A cognitive diagnostic analysis of the MELAB reading test. Spaan Fellow Working Papers in Second or Foreign Language Assessment, 9, 17-46.

Li, H., Hunter, C. V., & Lei, P. W. (2016). The selection of cognitive diagnostic models for a reading comprehension test. Language Testing, 33(3), 391-409.

Maydeu-Olivares, A. (2013). Goodness-of-fit assessment of item response theory models. Measurement: Interdisciplinary Research and Perspectives, 11(3), 71-101.

Munby, J. (1981). Communicative syllabus design: A sociolinguistic model for designing the content of purpose-specific language programmes. Cambridge University Press.

Pettit, N. T., & Cockriel, I. W. (1974). A factor study of the literal reading comprehension test and the inferential reading comprehension test. Journal of Literacy Research, 6(1), 63-75.

Ravand, H., & Robitzsch, A. (2015). Cognitive Diagnostic Modeling Using R. Practical Assessment, Research & Evaluation, 20.

Scott, H. S. (1998). Cognitive diagnostic perspectives of a second language reading test (Doctoral dissertation, University of Illinois at Urbana-Champaign).

Snow, R. E., & Lohman, D. F. (1989). Implications of cognitive psychology for educational measurement. American Council on Education.‏

Svetina, D., Gorin, J. S., & Tatsuoka, K. K. (2011). Defining and comparing the reading comprehension construct: A cognitive-psychometric modeling approach. International Journal of Testing, 11(1), 1-23.

Toker, T. (2010). Cognitive Diagnostic Assessment Of Timss-2007 Mathematics

Items For 8th Graders In Turkey.‏

Vernon, P. E. (1962). The determinants of reading comprehension. Educational and Psychological Measurement.

Yi, Y. (2013). Implementing a cognitive diagnostic assessment in an institutional test: a new networking model in language testing and experiment with a new psychometric model and task type (Doctoral dissertation, University of Illinois at Urbana-Champaign).

Zhang, J. (2013). Relationships between missing responses and skill mastery profiles of cognitive diagnostic assessment (Doctoral dissertation, University of Toronto).

Quarterly of Educational Measurement

Assessing the optimal method of detecting Differential Item Functioning in Computerized Adaptive Testing

References

References

Volume 9, Issue 33
October 2018
Pages 23-51

Assessing the optimal method of detecting Differential Item Functioning in Computerized Adaptive Testing

References

References

Volume 9, Issue 33October 2018Pages 23-51

Volume 9, Issue 33
October 2018
Pages 23-51