American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: AERA.
Banks, Kathleen. (2009). Using DDF in a Post Hoc Analysis to Understand Sources of DIF. Educational Assessment, 14:103–118
Bock, R. D. (1972) Estimating item parameters and latent proficiency when the responses are scored in two or more nominal categories. Psychometrika, 37., 29-51.
Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Sage Publications: Thousand Oaks, California.
DeMars, C. E. (2010). Type I error inflation for detecting DIF in the presence of impact. Educational and Psychological Measurement, 70, 961–972
Dorans, N. J and. Cook, Linda L (2016), Fairness in Educational Assessment and Measurement. NY: Taylor & Francis.
Dorans, N. J., Schmitt, A. P., & Bleistein, C. A. (1992). The standardization approach to assessing comprehensive differential item functioning. Journal of Educational Measurement, 29(4), 309-319.
Douglas, J., Roussos, L., & Stout, W. (1996). tem bundle DIF hypothesis testing: Identifying suspect bundles and assessing their DIF. Journal of Educational Measurement, 33, 465–484.
Feinberg, R. A. and Rubright, J. D. (2016). Conducting Simulation Studies in Psychometrics. Educational Measurement: Issues and Practice. Summer, Vol. 35, No. 2, pp. 36–49
Green, B. F., Crone, C. R., & Folk, V. G. (1989). A method for studying differential distractor functioning. Journal of Educational Measurement, 26(2), 147-160.
Guler, N. & Penfield, R. D. (2009). A Comparison of the Logistic Regression and Contingency Table Methods for Simultaneous Detection of Uniform and Nonuniform DIF. Journal of Educational Measurement. Fall, Vol. 46, No. 3, pp. 314–329
Harwell, M., Stone, C.,A., Hsu, Tse-Chi and Kirisci, L. (1996). Monte Carlo Studies in Item Response Theory. Applied Psychological Measurement. Vol. 20, No. 2, June, 101-125
Holland, P. W., & Thayer, D. T. (Eds.). (1988). Differential item performance and theMantel-Haenszel procedure. Hillsdale, NJ: Erlbaum.
Hidalgo, M.D., López-Pina, J.P. (2004). Differential item functioning detection and effect size: A comparison between logistic regression and Mantel–Haenszel procedures. Educational and Psychological Measurement, 64, 903–915.
Hutchinson, T.P. (1991). Ability, partial information, and guessing: Statistical modeling applied to multiple-choice tests.Rundle Mall: Rumsby Scientific.
Jodoin, M. G., & Gierl, M. J. (2001). Evaluating Type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education,14, 329-349.
Kato, K., Moen, R. E., & Thurlow, M. L. (2009). Differentials of a State Reading Assessment: Item Functioning, Distractor Functioning, and Omission Frequency for Disability Categories. Educational Measurement: Issues and Practice.Volume 28, Issue 2, Summer, Pages 28–40.
Kim, Seock-Ho & Cohen, Allan S. (1995). A Comparison of Lord's Chi-Square, Raju's Area Measures, and the Likelihood Ratio Test. on Detection of Differential Item Functioning, Applied Measurement in Education, 8:4, 291-312.
Li,Z. (2014). Power and Sample Size Calculations for LogisticRegression Tests for Differential Item Functioning. Journal of Educational Measurement.Winter 2014, Vol. 51, No. 4, pp. 441–462
Lord, F. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
Middleton K. and Laitusis C. Cahalan (2007). Examining Test Items for Differential Distractor Functioning Among Students With Learning Disabilities. (Research Report) ETS, Princeton, NJ
Mapuranga, R. Dorans, Neil J. and Middleton K. (2008). A Review of Recent Developments in Differential Item Functioning. (Research Report) ETS, Princeton, NJ
Narayanan, P., & Swaminathan, H. (1996). Identification of items that show nonuniform DIF.Applied Psychological Measurement, 20, 257–274.
Penfield Randall D. and Camilli Gregory (2007). Differential Item Functioning and Item Bias. In C.R. Rao & S. Sinharay (Eds.), Handbook of Statistics on Psychometrics, Vol. 26. (pp.125-167). Elsevier B.V.
Penfield, R. D. (2008). An odds ratio approach for assessing differential distractor functioning effects under the nominal response model. Journal of Educational Measurement, 45, 247-269.
Penfield, R. D. (2010). Modeling DIF Effects Using Distractor-Level Invariance Effects: Implications for Understanding the Causes of DIF. Applied Psychological Measurement, 34(3) 151–165
Penfield, R. D. (2016). Fairness in Test Scoring. In Neil J. Dorans and Linda L. Cook (Eds.), Fairness in Educational Assessment and Measurement. (pp.125-167). NY: Taylor & Francis.
Reif, M. (2015). mcIRT: Software Package for multiple choice items IRT models. Available on https://github.com/manuelreif/mcIRT
Roussos, L. A., & Stout, W. F. (1996). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenszel Type I error performance. Journal of Educational Measurement, 33, 215–230.