مدل‌سازی خطی سلسله‌مراتبی بین آماره‌ی برازش فرد H^T و متغیرهای زمینه‌ای دانش‌آموزان در آزمون ریاضی پایه‌ی هشتم مطالعه تیمز 2015

نوع مقاله: مقاله پژوهشی

نویسندگان

1 دانشجوی دکتری سنجش و اندازه‌گیری، دانشگاه تهران، تهران، ایران.

2 دانشیار روش ها و برنامه های آموزشی و درسی، دانشگاه تهران، تهران، ایران

3 دانشیار سنجش و اندازه گیری، دانشگاه علامه طباطبائی، تهران، ایران

4 دانشیار روان شناسی تربیتی، دانشگاه ساسکاچوان، ساسکاچوان، کانادا

5 دانشیار روشها و برنامه های آموزشی و درسی، دانشگاه تهران، تهران، ایران

چکیده

سنجش برازش فرد در تضمین روایی و عدالت استفاده و تفسیر نمره‌های آزمون، مفید واقع می‌شود. در این تحقیق با استفاده از آماره‌‌ی برازش فرد H^T به بررسی الگوی پاسخ آزمون ریاضی پایه‌ی هشتم مطالعه تیمز 2015 دانش ‌آموزان کشورهای استرالیا، ایران و جمهوری کره پرداخته شده است. برای بررسی تأثیر متغیرهای زمینه‌ای بر مقدار آماره‌ی برازش فرد دانش ‌آموزان، با توجه به ساختار سلسله ‌مراتبی داده‌ها از مدل ‌سازی خطی سلسله‌ مراتبی استفاده شد. به استناد ضریب همبستگی درون رده‌ای، 83.7% از واریانس آماره‌ی برازش فرد 〖 H〗^T در سطح دانش ‌آموز و 16.3% از آن در سطوح مدرسه و کشور است. همچنین با توجه به مدل نهایی خطی سلسله‌ مراتبی بین مقدار آماره‌ی برازش فرد H^T و عوامل سطوح دانش‌آموزان، مدارس و کشورها، صرفاً متغیرهای میانگین پیشرفت تحصیلی کشورها، تأکید مدرسه بر موفقیت تحصیلی دانش‌آموزان، اطمینان در یادگیری ریاضیات دانش‌آموزان و برآورد توانایی دانش‌آموزان از عوامل با ضرایب معنی‌دار در مدل نهایی هستند.

کلیدواژه‌ها


عنوان مقاله [English]

Hierarchical linear modeling between student’s H^T person fit statistic and contextual variables in eight grade mathematics test for TIMSS 2015

نویسندگان [English]

  • Pouria Rezasoltani 1
  • Ebrahim Khodaie 2
  • jalil Younesi 3
  • Amin Mousavi 4
  • Ali Moghadamzade 5
1 PhD Student in Measurement and Measurement, University of Tehran, Tehran, Iran
2 Associate Professor in Educational Methods and Curriculum, University of Tehran, Tehran, Iran
3 Associate Professor of Measurement and Measurement, Allameh Tabataba'i University, Tehran, Iran
4 Associate Professor in Educational Psychology, University of Saskatchewan, Saskatchewan, Canada
5 Associate Professor in Educational Methods and Curriculum, University of Tehran, Tehran, Iran
چکیده [English]

Person fit assessment is useful in ensuring validity and fairness in the use and interpretation of test scores. In this research, applied the H^T person fit statistic to examine response pattern of TIMSS eight grade mathematics test of Australia, Iran, and Republic of Korea. In order to investigate the effect of contextual variables on students’ person fit statistic, hierarchical linear modeling was used, due to the hierarchical structure of data. Based on intraclass correlation coefficient, 83.7% of variance of the H^T person fit statistic is at student level, and 16.3% of variance of the H^T person fit statistic is at school and country levels. In addition, according to the hierarchy linear final model between the H^T person fit statistic and factors of students, schools, and countries; only the average of students mathematics achievement in countries, school emphasis on students academic success, students confident in mathematics, and estimate of students ability, are significant factors in the final model.

کلیدواژه‌ها [English]

  • the H^T person fit statistic
  • hierarchical linear modeling
  • person fit assessment
  • response pattern
  • validity of test scores
بودون، ریمون (1373). روش‌های جامعه‌شناسی، ترجمه عبدالحسین نیک گهر، شرکت انتشارات علمی و فرهنگی، چاپ دوم.
سرمد، زهره؛ بازرگان، عباس؛ و حجازی، الهه (1384). روش‌های تحقیق در علوم رفتاری. تهران. نشر آگاه.
کبیری، مسعود؛ کریمی، عبدالعظیم؛ و بخشعلی زاده، شهرناز (1395). یافته‌های ملی تیمز 2015، روند 20 ساله آموزش علوم و ریاضیات ایران در چشم‌انداز بین‌المللی. پژوهشگاه مطالعات آموزش‌وپرورش. انتشارات مدرسه.
نقش، زهرا؛ و مقدم، اعظم (1391). کاربرد تکنیک‌های مدل‌یابی چندسطحی در تحلیل داده‌های تیمز 2007 و مقایسه آن با تحلیل یک‌سطحی. فصلنامه اندازه‌گیری تربیتی. دوره دوم. شماره هشتم.
Alivernini,F., Manganelli, S., & Vinci, E. (2008). Multilevel analysis of PIRLS 2006 data for Italy. Paper presented at the 3rd IEA International Research Conference, Taipei, and Chinese Taipei.
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: Author.
Armstrong, R. D., & Shi, M. (2009). An IRT-based cumulative sum statistic for person fit. Applied Psychological Measurement, 33(5), 391-410.
Childs, R. A., & Jaciw, A. P. (2003). Matrix sampling of items in large-scale assessments. Practical Assessment, Research & Evaluation, 8(16).
Coleman, J. S., Compbell, E. Q., Hobson, C. J., Mcpartland, J., Mood, A. M., Weinfeld, F. D., et al. (1966). Equality of Educational Opportunity. Washington DC: Department of Helth, Education & Welfare Office of Education.
Conijn, J. M., Emons, W. H. M., & Sijtsma, K. (2014). Statistics lz-based person-fit methods for noncognitive multiscale measures. Applied Psychological Measurement, 38(2), 122-136.
Conijn, J. M., Sijtsma, K., & Emons, W. H. M. (2016). Identifying person-fit latent classes, and explanation of categorical and continuous person misfit. Applied Psychological Measurement, 40(2), 128-141.
Cui, Y., & Li, J. (2015). Evaluating Person fit for cognitive diagnostic assessment. Applied Psychological Measurement, 39(3), 223-238.
Cui, Y., & Mousavi, A. (2015). Explore the usefulness of person-fit analysis on large-scale assessment. International Journal of Testing, 15(1), 23-49.
De la Torre, J., & Deng, W. (2008). Improving person-fit assessment by correcting the ability estimate and its reference distribution. Journal of Educational Measurement, 45(2), 159–177.
De Leeuw, J. and Kreft, I. G. G. (1986). Random coefficient models for multilevel analysis. Journal of Educational and Behavioral Statistics, 11(1), 57-85.
Dempster, A. P., Rubin, D. B., & Tsutakawa, R. K. (1981). Estimation in covariance components models. Journal of the American Statistical Association, 76, 341-353.
Dodeen, H., & Darabi, M. (2009). Person-fit: relationship with four personality tests in mathematics. Research Papers in Education, 24(1), 115–126.
Drasgow, F., Levine, M. V., & Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38(1), 67–86.
Du Toit, M. (2002). IRT from SSI: BILOG-MG, MULTILOG, PARSCALE, TESTFACT. Scientific Software International: Lincolnwood, IL, USA.
Emons, W. H. M., Sijtsma, K., Meijer, R. R. (2005). Global, local, and graphical person-fit analysis using person-response functions. Psychological Methods, 10(1), 101-119.
Ferrando, P. J. (2012). Assessing inconsistent responding in E and N measures: An application of person-fit analysis in personality. Personality and Individual Differences, 52(6), 718-722.
Finkelman, M., & Kim, W. (2007). Using person fit in a body of work standard setting. Paper presented at the American Educational Research Association, Chicago, IL, USA.
Goldstein, H. (2003). Multilevel Statistical Models, 3rd ed. London: Hodder Arnold.
Guo, J., & Drasgow, F. (2010). Identifying cheating on unprotected internet tests: The Z-test and the likelihood ratio test. International Journal of Selection and Assessment, 18(4), 351–364.
Guttman, L. (1944). A basis for scaling qualitative data. American Sociological Review, 9, 139–150.
Harnisch, D. L., & Linn, R. L. (1981). Analysis of item response patterns: Questionable test data and dissimilar curriculum practices. Journal of Educational Measurement, 18(3), 133–146.
Harrison, D. A., McLaughlin, M. E., & Coalter, T. M. (1996). Context, cognition and common method variance: Psychometric and verbal protocol evidence. Organizational Behavior and Human Decision Processes, 68(3), 246–261.
Harter, S. (1985). Manual for the self-perception profile for children. Denver, CO: University of Denver.
Hox, J. (2002). Multilevel analysis: Techniques and applications. Mahwah, NJ: Lawrence Erlbaum.
Karabatsos, G. (2003). Comparing the aberrant response detection performance of thirty-six person-fit statistics, Applied Measurement in Education, 16(4), 277-298.
Laird, N. M., & Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics, 38, 963-974.
Lamprianou, I. (2010). The practical application of optimal appropriateness measurement of empirical data using Rash models. Journal of Applied Measurement, 11(4), 409–423.
Lamprianou, I., & Boyle, B. (2004). Accuracy of measurement in the context of mathematics national curriculum tests in England for ethnic minority pupils and pupils who speak English as an additional language. Journal of Educational Measurement, 41(3), 239–259.
Lanyon, R. I., & Goodstein, L. D. (1997). Personality assessment (3rd ed.). New York, NY: Wiley.
Levine, M. V., & Drasgow, F. (1988). Optimal appropriateness measurement. Psychometrika, 53(2), 161–176.
Liu, M. T., & Yu, P. T. (2011). Aberrant learning achievement detection based on person-fit statistics in personalized e-learning systems learning systems. Educational Technology & Society, 14(1), 107-120.
Longford, N. T. (1987). A fast scoring algorithm for maximum likelihood estimation in unbalanced mixed models with nested random effects. Biometrika, 74(4), 817-827.
Longford, N. T. (1993). Random Coefficient Models. New York: Oxford University Press.
Martin, M. O., Mullis, I. V.S., & Hooper, M. (2016). Methods and procedures in TIMSS 2015. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College.
Meijer, R. R. (1997). Person fit and criterion-related validity: An extension of the Schmitt, Cortina, and Whitney study. Applied Psychological Measurement, 21(2), 99 -113.
Meijer, R. R., Egberink, J. H. L., Emons, W. H. M., & Sijtsma, K. ( 2008). Detection and validation of unscalable item score patterns using item response theory: An illustration with Harter’s self-perception profile for children. Journal of Personality Assessment, 90(3), 227–238.
Meijer, R. R., & Sijtsma, K. (1995). Detection of aberrant item score patterns: A review of recent developments. Applied Measurement in Education, 8(3), 261–272.
Meijer, R. R., & Sijtsma, K. (2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25(2), 107-135.
Mousavi, S. A. (2015). The effect of person misfit on item parameter estimation: A simulation study. Doctoral dissertation, University of Alberta.
Mousavi, A., Tendeiro, J. N., & Younesi, J. (2016). Person fit assessment using the PerFit package in R. The Quantitative Methods for Psychology. 12(3), 232-242.
Olson, J. F., Martin, M. O., & Mullis, I. V.S. (2008). TIMSS 2007 Technical Report. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College.
Petridou, A., & Williams, J. (2007). Accounting for aberrant test response patterns using multilevel models. Journal of Educational Measurement, 44(3), 227–247.
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical Linear Models: Applications and Data Analysis Methods, 2nd ed. Thousand Oaks, CA: Sage.
Rudner, L. M., Bracey, G., & Skaggs, G. (1996). The use of a person-fit statistic with one high quality achievement test. Applied Measurement in Education, 9(1), 91–109.
Rupp, A. A. (2013). A systematic review of the methodology for person fit research in Item Response Theory: Lessons about generalizability of inferences from the design of simulation studies. Psychological Test and Assessment Modeling, 55(1), 3-38.
Schmitt, N. S., Cortina, J. M., & Whitney, D. J. (1993). Appropriateness fit and criterion-related validity. Applied Psychological Measurement, 17(2), 143-150.
Sijtsma, K. (1986). A coefficient of deviance of response patterns. Kwantitatieve Methoden, 7(22), 131–145.
Sijtsma, K., & Meijer, R. R. (1992). A method for investigating the intersection of item response function in Mokken’s nonparametric IRT model. Applied Psychological Measurement, 16(2), 149-157.
Smith, R. M. (1985). A comparison of Rasch person analysis and robust estimators. Educational andPsychological Measurement, 45(3), 433–444.
Snijders, T. B. (2001). Asymptotic null distribution of person fit statistics with estimated person parameter. Psychometrika, 66(3), 331-342.
Snijders, T. A. B., & Bosker, R. J. (1999). Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling (1st ed.): Thousand Oaks: Sage Publications.
Tatsuoka, K. K., & Tatsuoka, M. M. (1983). Spotting erroneous rules of operation by the individual consistency index. Journal of Educational Measurement, 20(3), 221–230.
Tendeiro, J. N., Meijer, R. R., Schakel, L., & Maij-de Meij, A. M. (2013). Using cumulative sum statistics to detect inconsistencies in unproctored Internet testing. Educational and Psychological Measurement, 73(1), 143-161.
Trabin, T. E., & Weiss, D. J. (1983). The person response curve: fit of individuals to item response theory models. In D. J. Weiss (Ed.), new horizons in testing. New York: Academic Press.
Van der Flier, H. (1982). Deviant response patterns and comparability of test scores. Journal of Cross-Cultural Psychology, 13(3), 267–298.
Woods, C. M., Oltmanns, T. F., & Turkheimer, E. (2008). Detection of aberrant responding on a personality scale in a military sample: an application of evaluating person fit with two-level logistic regression. Psychological Assessment, 20(2), 159-168.
Wright, B. D., & Masters, G. N. (1982). Rating scaleanalysis. Chicago: MESA Press.
Wright, B. D., & Stone, M. H. (1979). Best test design. Rasch measurement. Chicago: Mesa Press.