بودون، ریمون (1373). روشهای جامعهشناسی، ترجمه عبدالحسین نیک گهر، شرکت انتشارات علمی و فرهنگی، چاپ دوم.
سرمد، زهره؛ بازرگان، عباس؛ و حجازی، الهه (1384). روشهای تحقیق در علوم رفتاری. تهران. نشر آگاه.
کبیری، مسعود؛ کریمی، عبدالعظیم؛ و بخشعلی زاده، شهرناز (1395). یافتههای ملی تیمز 2015، روند 20 ساله آموزش علوم و ریاضیات ایران در چشمانداز بینالمللی. پژوهشگاه مطالعات آموزشوپرورش. انتشارات مدرسه.
نقش، زهرا؛ و مقدم، اعظم (1391). کاربرد تکنیکهای مدلیابی چندسطحی در تحلیل دادههای تیمز 2007 و مقایسه آن با تحلیل یکسطحی. فصلنامه اندازهگیری تربیتی. دوره دوم. شماره هشتم.
Alivernini,F., Manganelli, S., & Vinci, E. (2008). Multilevel analysis of PIRLS 2006 data for Italy. Paper presented at the 3rd IEA International Research Conference, Taipei, and Chinese Taipei.
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: Author.
Armstrong, R. D., & Shi, M. (2009). An IRT-based cumulative sum statistic for person fit. Applied Psychological Measurement, 33(5), 391-410.
Childs, R. A., & Jaciw, A. P. (2003). Matrix sampling of items in large-scale assessments. Practical Assessment, Research & Evaluation, 8(16).
Coleman, J. S., Compbell, E. Q., Hobson, C. J., Mcpartland, J., Mood, A. M., Weinfeld, F. D., et al. (1966). Equality of Educational Opportunity. Washington DC: Department of Helth, Education & Welfare Office of Education.
Conijn, J. M., Emons, W. H. M., & Sijtsma, K. (2014). Statistics lz-based person-fit methods for noncognitive multiscale measures. Applied Psychological Measurement, 38(2), 122-136.
Conijn, J. M., Sijtsma, K., & Emons, W. H. M. (2016). Identifying person-fit latent classes, and explanation of categorical and continuous person misfit. Applied Psychological Measurement, 40(2), 128-141.
Cui, Y., & Li, J. (2015). Evaluating Person fit for cognitive diagnostic assessment. Applied Psychological Measurement, 39(3), 223-238.
Cui, Y., & Mousavi, A. (2015). Explore the usefulness of person-fit analysis on large-scale assessment. International Journal of Testing, 15(1), 23-49.
De la Torre, J., & Deng, W. (2008). Improving person-fit assessment by correcting the ability estimate and its reference distribution. Journal of Educational Measurement, 45(2), 159–177.
De Leeuw, J. and Kreft, I. G. G. (1986). Random coefficient models for multilevel analysis. Journal of Educational and Behavioral Statistics, 11(1), 57-85.
Dempster, A. P., Rubin, D. B., & Tsutakawa, R. K. (1981). Estimation in covariance components models. Journal of the American Statistical Association, 76, 341-353.
Dodeen, H., & Darabi, M. (2009). Person-fit: relationship with four personality tests in mathematics. Research Papers in Education, 24(1), 115–126.
Drasgow, F., Levine, M. V., & Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38(1), 67–86.
Du Toit, M. (2002). IRT from SSI: BILOG-MG, MULTILOG, PARSCALE, TESTFACT. Scientific Software International: Lincolnwood, IL, USA.
Emons, W. H. M., Sijtsma, K., Meijer, R. R. (2005). Global, local, and graphical person-fit analysis using person-response functions. Psychological Methods, 10(1), 101-119.
Ferrando, P. J. (2012). Assessing inconsistent responding in E and N measures: An application of person-fit analysis in personality. Personality and Individual Differences, 52(6), 718-722.
Finkelman, M., & Kim, W. (2007). Using person fit in a body of work standard setting. Paper presented at the American Educational Research Association, Chicago, IL, USA.
Goldstein, H. (2003). Multilevel Statistical Models, 3rd ed. London: Hodder Arnold.
Guo, J., & Drasgow, F. (2010). Identifying cheating on unprotected internet tests: The Z-test and the likelihood ratio test. International Journal of Selection and Assessment, 18(4), 351–364.
Guttman, L. (1944). A basis for scaling qualitative data. American Sociological Review, 9, 139–150.
Harnisch, D. L., & Linn, R. L. (1981). Analysis of item response patterns: Questionable test data and dissimilar curriculum practices. Journal of Educational Measurement, 18(3), 133–146.
Harrison, D. A., McLaughlin, M. E., & Coalter, T. M. (1996). Context, cognition and common method variance: Psychometric and verbal protocol evidence. Organizational Behavior and Human Decision Processes, 68(3), 246–261.
Harter, S. (1985). Manual for the self-perception profile for children. Denver, CO: University of Denver.
Hox, J. (2002). Multilevel analysis: Techniques and applications. Mahwah, NJ: Lawrence Erlbaum.
Karabatsos, G. (2003). Comparing the aberrant response detection performance of thirty-six person-fit statistics, Applied Measurement in Education, 16(4), 277-298.
Laird, N. M., & Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics, 38, 963-974.
Lamprianou, I. (2010). The practical application of optimal appropriateness measurement of empirical data using Rash models. Journal of Applied Measurement, 11(4), 409–423.
Lamprianou, I., & Boyle, B. (2004). Accuracy of measurement in the context of mathematics national curriculum tests in England for ethnic minority pupils and pupils who speak English as an additional language. Journal of Educational Measurement, 41(3), 239–259.
Lanyon, R. I., & Goodstein, L. D. (1997). Personality assessment (3rd ed.). New York, NY: Wiley.
Levine, M. V., & Drasgow, F. (1988). Optimal appropriateness measurement. Psychometrika, 53(2), 161–176.
Liu, M. T., & Yu, P. T. (2011). Aberrant learning achievement detection based on person-fit statistics in personalized e-learning systems learning systems. Educational Technology & Society, 14(1), 107-120.
Longford, N. T. (1987). A fast scoring algorithm for maximum likelihood estimation in unbalanced mixed models with nested random effects. Biometrika, 74(4), 817-827.
Longford, N. T. (1993). Random Coefficient Models. New York: Oxford University Press.
Martin, M. O., Mullis, I. V.S., & Hooper, M. (2016). Methods and procedures in TIMSS 2015. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College.
Meijer, R. R. (1997). Person fit and criterion-related validity: An extension of the Schmitt, Cortina, and Whitney study. Applied Psychological Measurement, 21(2), 99 -113.
Meijer, R. R., Egberink, J. H. L., Emons, W. H. M., & Sijtsma, K. ( 2008). Detection and validation of unscalable item score patterns using item response theory: An illustration with Harter’s self-perception profile for children. Journal of Personality Assessment, 90(3), 227–238.
Meijer, R. R., & Sijtsma, K. (1995). Detection of aberrant item score patterns: A review of recent developments. Applied Measurement in Education, 8(3), 261–272.
Meijer, R. R., & Sijtsma, K. (2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25(2), 107-135.
Mousavi, S. A. (2015). The effect of person misfit on item parameter estimation: A simulation study. Doctoral dissertation, University of Alberta.
Mousavi, A., Tendeiro, J. N., & Younesi, J. (2016). Person fit assessment using the PerFit package in R. The Quantitative Methods for Psychology. 12(3), 232-242.
Olson, J. F., Martin, M. O., & Mullis, I. V.S. (2008). TIMSS 2007 Technical Report. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College.
Petridou, A., & Williams, J. (2007). Accounting for aberrant test response patterns using multilevel models. Journal of Educational Measurement, 44(3), 227–247.
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical Linear Models: Applications and Data Analysis Methods, 2nd ed. Thousand Oaks, CA: Sage.
Rudner, L. M., Bracey, G., & Skaggs, G. (1996). The use of a person-fit statistic with one high quality achievement test. Applied Measurement in Education, 9(1), 91–109.
Rupp, A. A. (2013). A systematic review of the methodology for person fit research in Item Response Theory: Lessons about generalizability of inferences from the design of simulation studies. Psychological Test and Assessment Modeling, 55(1), 3-38.
Schmitt, N. S., Cortina, J. M., & Whitney, D. J. (1993). Appropriateness fit and criterion-related validity. Applied Psychological Measurement, 17(2), 143-150.
Sijtsma, K. (1986). A coefficient of deviance of response patterns. Kwantitatieve Methoden, 7(22), 131–145.
Sijtsma, K., & Meijer, R. R. (1992). A method for investigating the intersection of item response function in Mokken’s nonparametric IRT model. Applied Psychological Measurement, 16(2), 149-157.
Smith, R. M. (1985). A comparison of Rasch person analysis and robust estimators. Educational andPsychological Measurement, 45(3), 433–444.
Snijders, T. B. (2001). Asymptotic null distribution of person fit statistics with estimated person parameter. Psychometrika, 66(3), 331-342.
Snijders, T. A. B., & Bosker, R. J. (1999). Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling (1st ed.): Thousand Oaks: Sage Publications.
Tatsuoka, K. K., & Tatsuoka, M. M. (1983). Spotting erroneous rules of operation by the individual consistency index. Journal of Educational Measurement, 20(3), 221–230.
Tendeiro, J. N., Meijer, R. R., Schakel, L., & Maij-de Meij, A. M. (2013). Using cumulative sum statistics to detect inconsistencies in unproctored Internet testing. Educational and Psychological Measurement, 73(1), 143-161.
Trabin, T. E., & Weiss, D. J. (1983). The person response curve: fit of individuals to item response theory models. In D. J. Weiss (Ed.), new horizons in testing. New York: Academic Press.
Van der Flier, H. (1982). Deviant response patterns and comparability of test scores. Journal of Cross-Cultural Psychology, 13(3), 267–298.
Woods, C. M., Oltmanns, T. F., & Turkheimer, E. (2008). Detection of aberrant responding on a personality scale in a military sample: an application of evaluating person fit with two-level logistic regression. Psychological Assessment, 20(2), 159-168.
Wright, B. D., & Masters, G. N. (1982). Rating scaleanalysis. Chicago: MESA Press.
Wright, B. D., & Stone, M. H. (1979). Best test design. Rasch measurement. Chicago: Mesa Press.