Alper Kose, I., & Demirtasli, N. C. (2012). Comparison of unidimensional and multidimensional models based on item response theory in terms of both variables of test length and sample size. Procedia- Social and Behavioral Sciences, 46:135 – 140.
Cai L, Yang, J. S., & Hansen, M. (2011). Generalized full-information item bifactor analysis. Psychological Methods. 16: 221–248.
DeMars, C. E. (2006). Application of the bi-factor multidimensional item response theory model to Testlet-Based tests. Journal of Educational Measurement, 43(2):145–168.
DeMars, C. E. (2013). A Tutorial on interpreting bifactor model scores, International Journal of Testing, 13(4): 354-378.
Devins, G.M., Beiser, M., Dion, R., Pelletier, L.G. & Edwards, R.G. (1997) Cross-cultural measurements of psychological well-being: The psychometric equivalence of Cantonese, Vietnamese, and Laotian translations of the affect balance scale. American Journal of Public Health, 87: 794-799.
Duan, J. C., Hardle, W. K. & Gentle, J. E. (2012). Handbook of computational finance. Springer, Berlin-Heidelberg.
Fukuhara, H. (2009). A differential item functioning model for testlet-based items using bi-factor multidimensional item response theory model: a bayesian approach. A dissertation for the degree of Doctor of Philosophy, Department of Educational Psychology and Learning Systems, florida state university libraries
Fukuhara, H., & Kamata, A. (2011). A bifactor multidimensional item response theory model for differential item functioning analysis on testlet-based items. Applied Psychological Measurement, 35(8): 604–622.
Johnson, T. P. (1998). Approaches to equivalence in cross-cultural and cross-national survey research. in J harknes(eds). Cross-Cultural survey Equivalence (pp 1-40). ZUMA-nachrichten spezial band 3, Mannheim, ZUMA.
Kirsch, I,. De Jong, J,. Lafontaine, D. McQueen,J,. Mendelovits, J,. & Monseur,C. (2002). Reading for change: Performance and engagement across countries result from PISA 2000. New York, Organisation for Economic Cooperation and Development (OECD).
Lenkeit, J,. Chan, J,. Hopfenbeck, T. N,. & Baird, J. A. (2015). A review of the representation of PIRLS related research in scientific journals.
Educational Research Review, 16: 102-115. DOI:
http://dx.doi.org/10.1016/j.edurev.2015.10.002
Lin, J., & Wu, F. (2003). Differential performance by gender in foreign language testing, The university of Alberta. The centre for research in applied measurement and evaluation. Retrieved Februray 19, 2015, from
https://pdfs.semanticscholar.org/f0c1 /e30566e69e73841a52476cab5c1f5014518f.pdf
Ling Ping, H. & Islam, M. A. (2008). Analyzing incomplete categorical data: Revisiting maximum likelihood estimation (MLE) procedure. Journal of Modern Applied Statistical Methods, 7 (2): 488-500.
MD Desa, Z, N. D. (2012). Bi-factor multidimensional item response theory modeling for subscores estimation, reliability, and classification. A dissertation for the degree of PHD in Department of Psychology and Research in Education, the University of Kansas.
Montgomery, M. (2014). Unidimensional models do not fit unidimensional mixed format data better than multidimensional models. A dissertation for the degree of PHD in Department of Psychology and Research in Education, University of Kansas.
Mullis, I. V. S., Martin, M.O., Kennedy, A. M., Trong, K. L., & Sainsbury, M. (2009). PIRLS 2011 assessment framework, the international association for the evaluation of educational achievement (IEA). Amsterdam, the Netherlands, TIMSS & PIRLS international study center lynch school of education, Boston College.
Rauch, D. P., & Hartig, J. (2010). Multiple-choice versus open-ended response formats of reading test items: A two-dimensional IRT analysis. Psychological Test and Assessment Modeling, 52 (4): 354-379.
Ravand, H. (2015). Assessing testlet effect, impact differential testlet, and item functioning using cross-classified multilevel measurement modeling. Retrieved June 20, 2017, from
http://journals.sagepub.com/doi/pdf/10.1177/2158244015585607
Rijmen, F. (2011). Hierarchical factor item response theory models for PIRLS: capturing clustering effects at multiple levels. IERI monograph series: issues and methodologies in large-scale assessments, 4: 59-74
Suksuwan, s,. Junpeng, P., Ngudgratoke, S., & Guayjarernpanishk., P. (2012). The effect of proportion common items with mixed format test on multidimensional item response theory linking. procedia-social and behavioral sciences, 69: 1505-1511.
Tao, W. (2008). Using the score-based testlet method to handle local item dependence. A dissertation for the degree of PHD in department of Educational Research, Measurement, and Evaluation, Boston College
Van de Vijver, F. J. R. & Leung, K. (1997). Methods and data analysis for cross-cultural research. Thousand Oaks, CA: Sage Publications
Wiberg, M. (2007). Measuring and detecting differential item functioning in criterion-referenced licensing test: A Theoretic Comparison of Methods. Educational measurement, technical report N. 2.
Yen, W. M. (1993). Scaling performance assessment: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187-213.
Zenisky, A. P., Hambleton, R. K., & Sireci, S. G. (2001). Effects of local item dependence on the validity of IRT item, test, and ability statistics. MCAT Monograph. Association of American Medical Colleges, Section for the Medical College Admission TestMonograph N. 5.