Document Type : Research Paper

Author

Abstract

Abstract
It is necessary for international tests, such as TIMSS and PIRLS, to have Structural Equivalence, also known as Structural Comparability. In other words, test items should be functionally identical for all the participant countries and groups. The present research was an attempt to examine the structural comparability of TIMSS 2007 8th-Grade Science Test and differential functioning of its items among Iranian and American students, as well as the effect of items with differential functioning on the performance of Iranian students. A combination of confirmatory factor analysis and Item Response Theory was used to analyze the data and answer research questions. The results of factor analysis indicated that the science test had structural comparability between the two groups. This finding suggests that both Iranian and American students use an identical conceptual framework to answer the test items. However, the results of differential item functioning analysis indicated that 62% of TIMSS 2007 8th-Grade Science Test items had differential functioning against Iranian students. In short, the poor performance of Iranian students in TIMSS 2007 Science Test cannot be attributed to differential item functioning, and the causes should be sought elsewhere. Ministry of Education officials and administrators should try to teach the key concepts of any field in an interrelated manner and to cultivate divergent and multidimensional thinking skills and ability in students by preparing appropriate educational contents. Moreover, attempts must be made to distance the system from the traditional method of teaching, which mainly consisted of theoretical lectures, and involve students in practical and laboratory activities.
 

Keywords

کریمی، عبدالعظیم. (1388). مجموعه سئوال‌های علوم و ریاضیات تیمز TIMSS (پایه سوم راهنمایی). تهران: پژوهشگاه مطالعات آموزش و پرورش.
Barrett, P. (2007). Structural Equation Modeling: Adjusting model fit. Personality and Individual Differences, 42, 815-824.
Beaton, A. E.(1998). Comparing cross-national student performance on TIMSS using different test items. International Journal of  Educational Research, 29, 529-542.
Brown, T. A.(2006). Confirmatory factor analysis for applied research. NY: The Guilford Press.
Byrne, B. M., & Stewart, S,. M. (2003). The MACS approach to testing for multigroup invariance of a second-order factor structure: Awalk through the process. Structural Equatiopn Modeling, 13, 287-321.
Campbell, H. L., Barry, C. L., Joe, J. N. & Finney, S. J. (2008). Configural, metric and scalar invarince of the modified achievement goal questinnaire across africanamerican and white university studenys. Educational and Psychological Measurement, 68, 988-1007.
Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling, 9, 233–255.
DeCarlo, L. T.(1997). On the meaning and use of kurtosis. Psychological Methods, 2, 292-307.
Dorans, N. J., & Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the Scholastic Aptitude Test. Journal of Educational Measurement, 23, 355-368.
Dorans, N. J., & Kulick, E. (2006). Differential item functioning on the Mini-Mental State Examination : An application of the Mantel-Haenszel and standardization pricedures. Medical Care, 44(Suppl 3),S107-S114.
Ercikan, K.(1998). Translation effects in international assessments. International Journal of Educational Research, 29, 543-553.
Ercikan, K., & Koh, K. (2005). Examining the construct comparability of the English and French versions of TIMSS. International Journal of Testing, 5, 23-35.
Ercikan, K., & McCreith, T. (2002). Effects of adaptations on comparability of test items and test scores. In D. F. Robitaille & A. E. Beaton. (Eds.), Secondary Analysis of the TIMSS Data(391-405). Kluwer Academic Publishers.
Fleer, P.F.(1993)  A Monte Carlo assessment of a new measure of item and test bias. Illinois Institute of Technology. Dissertation Abstracts International. 54(04B):2266.
Flowers, C. P., Oshima, T. C., & Raju, N.S.(1999). A description and demonstration of the polytomous DFIT framework. Applied Psychological Measurement. 23,309–32.
Frase, C.(1988). NOHARM: An IBM PC program for fitting both unidimensional and multidimensional normal ogive models of latent trait theory. Armidale, Australia:The University of New England.
Floyd; F.J. , & Widaman, K. F. (1995). Factor analysis in development and refinement of clinical assessment instrument . Psychological Assessment, 3, 286-299.
Horn, J. L., & McArdle, J. J. (1992). A practical and theoretical guide in measurement invariance in aging research. Experimental Aging Research, 18, 117-144.
Kline, P. (2005). Principles and practice of structural equation modeling (2nd ed.). New York: Guilford Press.
Kline, P. (2011). Principles and practice of structural equation modeling (3rd ed.). New York: Guilford Press.
Lee, Y-S., Park, Y-S., & Tayalan, D. (2011). A cognitive diagnostic modeling of attribute mastery in Massachusetts, Minnesota, and U.S national sample using TIMSS 2007. International Journal of testing, 11, 144-177.
Little, T. D.(1997). Mean and covariance structures(MACS) analysis of cross-cultural data: Practical and theoretical issues. Multivariate Behavioral Research, 32, 53-76.
Muthén, L. K., & Muthén, B. O. (1998–2010). Mplus user’s guide (6th ed.). Los Angeles: Muthén & Muthén.
Olson, J.F., Martin, M.O., & Mullis, I.V.S. (Eds.) (2008). Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College. Available in PDF format at: http://timss.bc.edu/timss2007/techreport.html
Orlando-Edelen, M., Thissen, D., Teresi, J. A., Kleinman, M., & Ocepek-Welikson, K. (2006). Identification of differential item functioning using item response theory and the likelihood-based model comparison approach: Application to the Mini-mental status examination. Medical Care, 44, S134–S142.
Raju, N. S.(1999). DFIT5P: A fortran program for calculating dichotomous DIF/DTF[computer program]. Chicago, Il: Illinois Institute of Technology.
Raju, N. S., Van der Linden, W., & Fleer, P.(1995). An IRT-based internal measure of test bias with applications for differential item functioning. Applied Psychological Measurement, 19, 353-368.
Reise, S. P., Widaman, K. F., & Pugh, R. H. (1993). Confirmatory factor analysis and item response theory: Two Approaches for exploring measurement invariance, Psychological Bulletin, 114, 552-566.
Teresi, J. A.(2006). Overview of quantitative measurement methods: Equivalence, invariance and differential item functioning in health applications. Medical Care,(Supll 3), 44, S39-S49.
Teresi, J. A., Kleinman, M. & Ocepek-Welikson, K. (2000). Modern psychometric methods for detection of differential item functioning: Application to cognitive assessment measures. Statistics in Medicine, 19, 1651-1683.
Teresi, J. A., Ocepek-Welikson, K., Kleinman, M., Cook, K. F., et al. (2007). Evaluation measurement equivalence using the item response theory log-likelihood ratio (IRTLR) method to assess differential item functioning(DIF): Applications (with illustration) to measure physical functioning ability and general distress. Quality Life Research, 16(Suppl 1), 43-68.
Teresi, J. A., Ocepek-Welikson, K., Kleinman, M., Eimicke, J. P., et al. (2009). Analysis of differential item functioning in the depression item bank from the Patient Reported Outcome Measurement Information System (PROMIS): An item response approach. Psychology Science Quarterly, 51, 148-180.
Thissen, D. (2001). IRTLRDIF(version 2.02b): Software for the computation of the statistics involved in item response theory likelihood-ratio tests for differential item functioning [Computer program]. Chapel Hill, NC: L. L. Thurstone Psychometric Laboratory.
Thissen, D., Chen, W., & Bock, D.(2003). MULTIOLOG: multiple category item analysis and test scoring using item response theory (Version 7.03). Lincolnwood, IL: Scientific Software International.
Thissen, D., Steinberg, L., & Kuang, D.(2002). Quick and easy implementation of the Benjamini-Hochberg procedure for controlling the false positive rate in multiple comparisions. Journal of Educational and Behavioral Statistics, 24, 77-83.
Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the Measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3, 4-70.
Wu, A. D., Li, Z. & Zumbo, B. D. (2007). Decoding the meaning of factorial invariance and updating the practice of multi-group confirmatory factor analysis: A demonstration with TIMSS data. Practical assessment Research & Evaluation, 12, 1-26. Available Online: http://pareonline.net/ pdf/v12n2.pdf