Document Type : Research Paper

Authors

1 Associate Professor, Allameh Tabataba'i University

2 Allameh Tabataba'i University

Abstract

The current study was aimed at exploring Differential Item Functioning (DIF) items in Iranian TEFL MA Entrance Exam employing two beneficial and valuable statistical methods: Logistic Regression (LR) and Mantel-Haenszel (MH). Besides, the founded DIF items were gone through a content analysis in order to explore the potential linguistic resources of such biases. To this end, the answer sheets of 2217 female and 735 male examinees in 2015 were analyzed to find items containing DIF. The findings of LR technique determined eight items as DIF containing items. Half of the items were advantageous to the men and the other half of the items favoured women. MH procedure explored eleven items as DIF flagging items. Out of these items, six items favoured male test takers and five items showed tendency toward female test takers. No particular linguistic source for such deviated behaviour of items was proposed through the content analysis of the DIF items.

Keywords

خدایی، ابراهیم. (1388). بررسی رابطه سرمایه اقتصادی و فرهنگی والدین دانش آموزان با احتمال قبولی آنها در آزمون سراسری سال تحصیلی 1385. فصلنامهانجمنآموزشعالیایران، 4(1)، 65-84.
رضایی، عباسعلی.، و شعبانی، عنایت­اله. (1388) تحلیل کارکرد افتراقی جنسیتی آزمون سنجش توانش عمومی زبان دانشگاه تهران ]ویژه نامه[. پژوهش زبان­های خارجی، 56، 89-108.
نوربخش، سید مرتضی. (1389). نقش سرمایه های فرهنگی، اجتماعی و اقتصادی خانواده در موفقیت داوطلبان آزمون سراسری. فصلنامهبرنامه­ریزیرفاهوتوسعهاجتماعی، 4(1)، 93-134.
Alavi, S. M., & Karami, H. (2010). Differential item functioning and ad hoc interpretations. TELL, 4(1), 1-18.
Allalouf, A., Hambleton, R. K., & Sireci, S. G. (1999). Identifying the causes of DIF in translated verbal items. Journal of Educational Measurement, 36(3), 185-198.
American Educational Research Association, American Psychological Association, National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC.
Amirian, S. M. R. (2012). Investigating UTEPT for gender and academic discipline. (Unpublished PhD dissertation). Tehran University, Tehran.
Angoff, W. H. (1993). Perspectives on Differential Item Functioning methodology. In P. W. Holland, & H. Wainer (Eds.), Differential Item Functioning (pp. 3-23). Hillsdale, NJ: Lawrence Erlbaum.
Birjandi, P. & Amini, M. (2007). Differential item functioning: The case of Mantel-Haenszel and functioning (Test Bias) analysis paradigm across Manifest and Latent examinee groups (on the construct validity of IELTS). Human Sciences, 8(2) 1-20.
Camilli, G. (2006). Test fairness. In R. Brennan (Eds.), Educational measurement (pp. 221-256). Westport, CT: American Council on Education and Praeger.
Camilli, G. & Shepard, L. (1994). Methods for identifying biased test items. Thousand Oaks, CA: Sage.
Carlton, S. T., & Harris, A. M. (1992). Characteristics associated with Differential Item Functioning on the Scholastic Aptitude Test: Gender and majority/minority group comparisons (Report no. 64). Princeton, NJ: Educational Testing Service.
Cohen, A. S. (1988). Statistical power analysis for behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum Association.
Conoley, C. A. (2003). Differential item functioning in the Peabody Picture Vocabulary Test–Third Edition: Partial correlation versus Expert judgment. (Unpublished doctoral dissertation). Texas A&M University, TX.
French, A. W., & Miller, T. R. (1996). Logistic regression and its use in detecting Differential Item Functioning in polytomous items. Journal of Educational Measurement, 33(3), 315-332.
Geranpayeh, A., & Kunnan, A. J. (2007). Differential Item Functioning in terms of age in the certificate in advanced English examination. Language Assessment Quarterly, 4(2), 190-222.
Holland, P. W. & Thayer, D. T. (1988). Differential item performance and Mantel- Haenszel procedure. In H. Wainer & H. Braun, (Eds.), Test validity (pp. 129-45). Hillsdale, NJ: Lawrence Erlbaum.
Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14(4), 329-349.
Karami, H. (2011). Detecting gender bias in a language proficiency test. International Journal of Language Studies, 5(2), 27-38.
Kunnan, A. J. & Weinstein-SHR, G. (1990), DIF in native language and gender groups in an ESL placement test. TESOL Quarterly, 24(4), 741-746.
Lawrence, I. M., & Curley, W. E. (1989). Differential item functioning for males and females on SAT-Verbal Reading sub-score items: Follow-up study. Educational Testing Service (Report no. 22). Princeton, NJ: ETS.
Lawrence, I. M., Curley, W. E. & McHale, F. J. (1988). Differential item functioning for males and females on SAT verbal reading subscore items. (Report No. 88-4). New York: College Entrance Examination Board.
Lin, J., & Wu, F. (2003, April). Differential performance by gender in foreign language testing. Poster for the 2003 annual meeting of NCME in Chicago.
McNamara, T. F., & Roever, C. (2006). Language testing: The social dimension. Oxford, UK: Blackwell Publishing.
Mellenbergh, G. J. (1982). Contingency table models for assessing item bias. Journal of Educational Statistics, 7(2), 105-118.
Pae, T. (2004). DIF for learners with different academic backgrounds. Language Testing, 21(1), 53-73.
Penfield, R. D. (2010). Modeling DIF effects using distractor-level invariance effects: Implications for understanding the causes of DIF. Applied Psychological Measurement, 34(3), 151-165.
Penfield, R. D. (2009). Differential Item Functioning Analytical System. DIFAS 5.0. User’s manual.
Penfield, R. D. (2005). DIFAS: Differential item functioning analysis system. Applied Psychological Measurement, 29(2), 150-151.
Roussos, L., & Stout, W. (1996). A multidimensionality-based DIF analysis paradigm. Applied Psychological Measurement, 20(4), 355-71.
Ryan, K., & Bachman, L. F. (1992). Differential item functioning on two tests of EFL Proficiency. Language Testing, 9(1) 12-29.
Schmitt, A. P., Holland, P. W., & Dorans, A. J. (1992). Evaluating hypothesis about differential item functioning. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 281-315). Hillsdale, NJ: Lawrence Erlbaum Associates.
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal for Educational Measurement, 27(4), 361-370.
Uiterwijk, H., & Vallen, T. (2005). Linguistic sources of item bias for second generation immigrants in Dutch tests. Language Testing, 22(2), 211-234.
Wainer, H., & Luckele, R. (1997). How reliable are TOEFL scores? Educational and Psychological Measurement, 57(5), 741-759.
Zumbo, B. D., & Thomas, D. R. (1997). A measure of effect size for a model-based approach for studying DIF (Working paper of the Edgeworth Laboratory for Quantitative Behavioral Science). Prince George, Canada: University of Northern British Columbia.
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, Canada: Directorate of Human Resources Research and Evaluation, Department of National Defense.