Document Type : Research Paper

Authors

1 department of psychology

2 tehran university

3 assistan professor in university of kharazmi

4 national center for international studies of PIRLS and TIMSS

Abstract

The present study has been done with the purpose of investigating the dimensionality and differential item functioning of the testlet-based test of Iran's PIRLS 2011. In order to analyze the dimensionality, graded response and bi-factor item-response theory models were used with full-information maximum likelihood estimation method and to analyze the differential item functioning multiple-group bi-factor model of Cai et al (2011) was applied. The results of the dimensionality investigation showed that the bi-factor model is better fitted to the data than the graded response model both in Iran's total sample and in boy and girl groups. The results of testlets effect variance showed that effects of second factors on Iranian students' performance in two testlet related to literal comprehension, has caused dimensionality in Iran's PIRLS testlets. The results showed that there was no significant difference in average students' performance of the boy and girl in general latent trait of reading comprehension, but the difference between the average reading proficiency of the boy and the girl in three literal and three informational testlet in favor of girls was significant. The result of differential items functioning based on the bifactor model showed that many items have an uniform and non-uniform differential item functioning, and boys in multiple-choice items and girls in constructed response items have better performances. In general, the results showed that in Iran's PIRLS 2011 testlets, the traits related to the two literal comprehension testlets were differently perceived between boy and girl students, and these two testlet had more local item dependence among girls than boys. Also, the results indicated a difference between the performance of Iranian boy and girl students in the mixed items format test of PIRLS.

Keywords

Alper Kose, I., & Demirtasli, N. C. (2012). Comparison of unidimensional and multidimensional models based on item response theory in terms of both variables of test length and sample size. Procedia- Social and Behavioral Sciences, 46:135 – 140.
Cai L, Yang, J. S., & Hansen, M. (2011). Generalized full-information item bifactor analysis. Psychological Methods. 16: 221–248.
DeMars, C. E. (2006). Application of the bi-factor multidimensional item response theory model to Testlet-Based tests. Journal of Educational Measurement, 43(2):145–168.
DeMars, C. E. (2013). A Tutorial on interpreting bifactor model scores, International Journal of Testing, 13(4): 354-378.
Devins, G.M., Beiser, M., Dion, R., Pelletier, L.G. & Edwards, R.G. (1997) Cross-cultural measurements of psychological well-being: The psychometric equivalence of Cantonese, Vietnamese, and Laotian translations of the affect balance scale. American Journal of Public Health, 87: 794-799.
Duan, J. C., Hardle, W. K. & Gentle, J. E. (2012). Handbook of computational finance. Springer, Berlin-Heidelberg.
Fukuhara, H. (2009). A differential item functioning model for testlet-based items using bi-factor multidimensional item response theory model: a bayesian approach. A dissertation for the degree of Doctor of Philosophy, Department of Educational Psychology and Learning Systems, florida state university libraries
Fukuhara, H., & Kamata, A. (2011). A bifactor multidimensional item response theory model for differential item functioning analysis on testlet-based items. Applied Psychological Measurement, 35(8): 604–622.
Goldstein, H. (2008). How may we use international comparative studies to inform education policy?, Retrieved Februray 19, 2015, from http://www.bristol.ac.uk/medialibrary /sites/cmm/migrated/documents/how-useful-are-international-comparative-studies-in-education.pdf
Johnson, T. P. (1998). Approaches to equivalence in cross-cultural and cross-national survey research. in J harknes(eds). Cross-Cultural survey Equivalence (pp 1-40). ZUMA-nachrichten spezial band 3, Mannheim, ZUMA.
Kirsch, I,. De Jong, J,. Lafontaine, D. McQueen,J,. Mendelovits, J,. & Monseur,C. (2002). Reading for change: Performance and engagement across countries result from PISA 2000. New York, Organisation for Economic Cooperation and Development (OECD).
Lenkeit, J,. Chan, J,. Hopfenbeck, T. N,. & Baird, J. A. (2015). A review of the representation of PIRLS related research in scientific journals. Educational Research Review, 16: 102-115. DOI: http://dx.doi.org/10.1016/j.edurev.2015.10.002
Lin, J., & Wu, F. (2003). Differential performance by gender in foreign language testing, The university of Alberta. The centre for research in applied measurement and evaluation. Retrieved Februray 19, 2015, from https://pdfs.semanticscholar.org/f0c1 /e30566e69e73841a52476cab5c1f5014518f.pdf
Ling Ping, H. & Islam, M. A. (2008). Analyzing incomplete categorical data: Revisiting maximum likelihood estimation (MLE) procedure. Journal of Modern Applied Statistical Methods, 7 (2): 488-500.
MD Desa, Z, N. D. (2012). Bi-factor multidimensional item response theory modeling for subscores estimation, reliability, and classification. A dissertation for the degree of PHD in Department of Psychology and Research in Education, the University of Kansas.
Montgomery, M. (2014). Unidimensional models do not fit unidimensional mixed format data better than multidimensional models. A dissertation for the degree of PHD in Department of Psychology and Research in Education, University of Kansas.
Mullis, I. V. S., Martin, M.O., Kennedy, A. M., Trong, K. L., & Sainsbury, M. (2009). PIRLS 2011 assessment framework, the international association for the evaluation of educational achievement (IEA). Amsterdam, the Netherlands, TIMSS & PIRLS international study center lynch school of education, Boston College.
Rauch, D. P., & Hartig, J. (2010). Multiple-choice versus open-ended response formats of reading test items: A two-dimensional IRT analysis. Psychological Test and Assessment Modeling, 52 (4): 354-379.
Ravand, H. (2015). Assessing testlet effect, impact differential testlet, and item functioning using cross-classified multilevel measurement modeling. Retrieved June 20, 2017, from http://journals.sagepub.com/doi/pdf/10.1177/2158244015585607
Rijmen, F. (2011). Hierarchical factor item response theory models for PIRLS: capturing clustering effects at multiple levels. IERI monograph series: issues and methodologies in large-scale assessments, 4: 59-74
Suksuwan, s,. Junpeng, P., Ngudgratoke, S., & Guayjarernpanishk., P. (2012). The effect of proportion common items with mixed format test on multidimensional item response theory linking. procedia-social and behavioral sciences, 69: 1505-1511.
Tao, W. (2008). Using the score-based testlet method to handle local item dependence. A dissertation for the degree of PHD in department of Educational Research, Measurement, and Evaluation, Boston College
Van de Vijver, F. J. R. & Leung, K. (1997). Methods and data analysis for cross-cultural research. Thousand Oaks, CA: Sage Publications
Wiberg, M. (2007). Measuring and detecting differential item functioning in criterion-referenced licensing test: A Theoretic Comparison of Methods. Educational measurement, technical report N. 2.
Yen, W. M. (1993). Scaling performance assessment: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187-213.
Zenisky, A. P., Hambleton, R. K., & Sireci, S. G. (2001). Effects of local item dependence on the validity of IRT item, test, and ability statistics. MCAT Monograph. Association of American Medical Colleges, Section for the Medical College Admission TestMonograph N. 5.