Document Type : Research Paper

Authors

1 Ph.D. Student in Deliberation and Measuring, Allameh Tabataba’i University, Tehran, Iran

2 Associate Professor, Department of Deliberation and Measuring, Allameh Tabataba’i University, Tehran, Iran

3 Professor, Department of Deliberation and Measuring, University of Tehran, Tehran, Iran

4 Professor, M Department of Deliberation and Measuring, Allameh Tabataba’i University, Tehran, Iran

Abstract

Identifying distractors as sources of Differential Item Functioning(DIF) in polyotomous items has great importance to designers and analysts. Although DIF is one of the common methods for examining the measurement invariance, It is accompanied by challenges and limitations, especially in multiple choice items. The purpose of this study was to assess the performance of Nested logit Model(NLM) for detecting Differential Distractor Functioning(DDF) by using experimental (simulated data) and descriptive-analytical (real data) methods. Six items were simulated under different conditions of difficulty and slope, ability distribution, presence or absence of DIF/DDF, and DIF/DDF magnitude, with a sample size of 2000 and 50 replicates. The data of Math Entrance Exam (D-form,2018), with a random sample of 2000 men and women constituted the real data. Based on the results of the simulation analysis: The NLM revealed 88% of DIF and 97% of DDF, on average. the Type I error rates is very close to the theoretical expected values, although it showed some inflation in unequal distribution conditions. according to the findings, the detection rate was influenced by the item parameters(difficulty and slope) and the DIF or DDF levels. Based on real data analysis, 2 items represented both DIF(Large and Medium) and DDF (Partial to Moderate) simultaneously, whereas in the NRM approach, 11 items detected as DIF/DDF; so, as expected the approaches based on “divided by distractor” strategy, fewer items were detected as DIF/DDF. The NLM approach, while separating the DDF from the DIF test, allows for a clear evaluation of whether the distractor may be responsible for DIF. Since high-stakes tests have a special role in selection and DIF and DDF analyzes have a special place in determining the validity and measurement invariance of these exam items, it is recommended to screen the bias items, DIF/DDF comprehensive analyzes based on NLM be used.

Keywords

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: AERA.
Banks, Kathleen. (2009). Using DDF in a Post Hoc Analysis to Understand Sources of DIF. Educational Assessment, 14:103–118
Bock, R. D. (1972) Estimating item parameters and latent proficiency when the responses are scored in two or more nominal categories. Psychometrika, 37., 29-51.
Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Sage Publications: Thousand Oaks, California.
DeMars, C. E. (2010). Type I error inflation for detecting DIF in the presence of impact. Educational and Psychological Measurement, 70, 961–972
Dorans, N. J and. Cook, Linda L (2016), Fairness in Educational Assessment and Measurement. NY: Taylor & Francis.
Dorans, N. J., Schmitt, A. P., & Bleistein, C. A. (1992). The standardization approach to assessing comprehensive differential item functioning. Journal of Educational Measurement, 29(4), 309-319.
Douglas, J., Roussos, L., & Stout, W. (1996). tem bundle DIF hypothesis testing: Identifying suspect bundles and assessing their DIF. Journal of Educational Measurement, 33, 465–484.
Feinberg, R. A. and Rubright, J. D. (2016). Conducting Simulation Studies in Psychometrics. Educational Measurement: Issues and Practice. Summer, Vol. 35, No. 2, pp. 36–49
Green, B. F., Crone, C. R., & Folk, V. G. (1989). A method for studying differential distractor functioning. Journal of Educational Measurement, 26(2), 147-160.
Guler, N. & Penfield, R. D. (2009). A Comparison of the Logistic Regression and Contingency Table Methods for Simultaneous Detection of Uniform and Nonuniform DIF. Journal of Educational Measurement. Fall, Vol. 46, No. 3, pp. 314–329
Harwell, M., Stone, C.,A., Hsu, Tse-Chi and Kirisci, L. (1996). Monte Carlo Studies in Item Response Theory. Applied Psychological Measurement. Vol. 20, No. 2, June, 101-125
Holland, P. W., & Thayer, D. T. (Eds.). (1988). Differential item performance and theMantel-Haenszel procedure. Hillsdale, NJ: Erlbaum.
Hidalgo, M.D., López-Pina, J.P. (2004). Differential item functioning detection and effect size: A comparison between logistic regression and Mantel–Haenszel procedures. Educational and Psychological Measurement, 64, 903–915.
Hutchinson, T.P. (1991). Ability, partial information, and guessing: Statistical modeling applied to multiple-choice tests.Rundle Mall: Rumsby Scientific.
Jodoin, M. G., & Gierl, M. J. (2001). Evaluating Type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education,14, 329-349.
Kato, K., Moen, R. E., & Thurlow, M. L. (2009). Differentials of a State Reading Assessment: Item Functioning, Distractor Functioning, and Omission Frequency for Disability Categories. Educational Measurement: Issues and Practice.Volume 28, Issue 2, Summer, Pages 28–40.
Kim, Seock-Ho & Cohen, Allan S. (1995). A Comparison of Lord's Chi-Square, Raju's Area Measures, and the Likelihood Ratio Test. on Detection of Differential Item Functioning, Applied Measurement in Education, 8:4, 291-312.
Li,Z. (2014). Power and Sample Size Calculations for LogisticRegression Tests for Differential Item Functioning. Journal of Educational Measurement.Winter 2014, Vol. 51, No. 4, pp. 441–462
Lord, F. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
 Middleton K. and Laitusis C. Cahalan (2007). Examining Test Items for Differential Distractor Functioning Among Students With Learning Disabilities. (Research Report) ETS, Princeton, NJ
Mapuranga, R. Dorans, Neil J. and Middleton K. (2008). A Review of Recent Developments in Differential Item Functioning. (Research Report) ETS, Princeton, NJ
Narayanan, P., & Swaminathan, H. (1996). Identification of items that show nonuniform DIF.Applied Psychological Measurement, 20, 257–274.
Penfield Randall D. and Camilli Gregory (2007). Differential Item Functioning and Item Bias. In C.R. Rao & S. Sinharay (Eds.), Handbook of Statistics on Psychometrics, Vol. 26. (pp.125-167). Elsevier B.V.
Penfield, R. D. (2008). An odds ratio approach for assessing differential distractor functioning effects under the nominal response model. Journal of Educational Measurement, 45, 247-269.
Penfield, R. D. (2010). Modeling DIF Effects Using Distractor-Level Invariance Effects: Implications for Understanding the Causes of DIF. Applied Psychological Measurement, 34(3) 151–165
Penfield, R. D. (2016). Fairness in Test Scoring. In Neil J. Dorans and Linda L. Cook (Eds.), Fairness in Educational Assessment and Measurement. (pp.125-167). NY: Taylor & Francis.
Reif, M. (2015). mcIRT: Software Package for multiple choice items IRT models. Available on https://github.com/manuelreif/mcIRT
Roussos, L. A., & Stout, W. F. (1996). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenszel Type I error performance. Journal of Educational Measurement, 33, 215–230.