نوع مقاله : مقاله پژوهشی

نویسندگان

1 دانشجوی دکتری رشته سنجش و اندازه گیری، دانشگاه تهران، تهران، ایران.

2 دانشیار دانشگاه تهران، تهران، ایران

3 استادیار گروه برنامه ریزی درسی، دانشگاه خوارزمی، تهران، ایران

4 دانشیار دانشکده روان شناسی و علوم‌تربیتی؛ دانشگاه تهران، تهران، ایران

5 استادیار سازمان سنجش آموزش کشور، تهران، ایران

چکیده

شاخص یودن یک معیار متداول برای منحنی ویژگی عملکرد (ROC) است که هم کارایی یک آزمون ملاک‌مرجع را می‌سنجد و هم مقدار نمره برش را برای آزمون مشخص می‌کند. این پژوهش با هدف مقایسه و ارزیابی سه روش برآورد ناپارامتری تجربی، هسته با روش پهنای باند سیلورمن و هسته با روش پهنای باند اعتبارسنجی متقابل ماکسیمم درستنمایی برای محاسبه مقدار شاخص یودن انجام شده است. در این پژوهش برای ارزیابی عملکرد از شاخص‌های خطای استاندارد بوت استرپ (BSE)، ریشه میانگین مربعات خطا (RMSE)، مربع خطای یکپارچه (ISE) و میانگین مربعات خطای یکپارچه (MISE)استفاده شده است. نتایج نشان می‌دهد که روش هسته با اعتبارسنجی متقابل ماکسیمم درستنمایی دارای مقدار شاخص یودن بالاتری بود. نمرات برش به دست آمده برای روش‌های هسته 479 و برای روش تجربی 465 به دست آمد. با توجه به نتایج قابل قبول شاخص‌های ارزیابی، روش‌های هسته به‌ویژه با پهنای باند بهینه اعتبارسنجی متقابل ماکسیمم درستنمایی منجر به برآوردهای قابل اعتمادتری از شاخص یودن و نمره برش برای آزمون تولیمو می‌شود.

کلیدواژه‌ها

عنوان مقاله [English]

Comparing nonparametric methods for determining cut scores in criterion referenced tests by using Youden index subject of study: Tolimo test

نویسندگان [English]

  • Maryam Parsaeian 1
  • Ebrahim Khodaie 2
  • Balal Izanloo 3
  • Keyvan Salehi 4
  • sima naghizadeh 5

1 universty of Tehran

2 Tehran University

3 Faculty of Psychology and Education, Kharazmi University, Tehran, Iran.

4 Associate professor, Division of Research and Assessment, Faculty of Psychology and Education, University of Tehran, Tehran, Iran

5 national organization for educational testing

چکیده [English]

The Youden index is a commonly used summary measure for the Receiver Operator Characteristic (ROC) curve that both measures the performance of a criterion-referenced test and specifies the cutoff score value for the test. This research aims to compare and evaluate three empirical non-parametric estimation methods, kernel with Silverman's bandwidth method and kernel with maximum likelihood cross-validation bandwidth method to calculate the value of Youden's index. In this research, bootstrap standard error (BSE), root mean square error (RMSE), square integrated error (ISE) and mean square integrated error (MISE) indices were used to evaluate the performance. The results show that the kernel method with maximum likelihood cross-validation had a higher Youden index value. The obtained cutoff scores were 479 for the kernel methods and 465 for the empirical method. According to the acceptable results of the evaluation indices, kernel methods especially with the optimal bandwidth of the maximum likelihood cross-validation lead to more reliable estimates of the Youden index and the cutoff score in Tolimo test results.

کلیدواژه‌ها [English]

  • cut scores
  • Youden index
  • nonparametric methods
  • criterion referenced tests
Adamowski, K. (1987). Nonparametric Techniques For Analysis Of Hydrological Events. Paper Presented At The Water For The Future: Hydrology In Perspective (Proceedings Of The Rome Symposium).
Aoki, K., Misumi, J., Kimura, T., Zhao, W., & Xie, T. (1997). Evaluation Of Cutoff Levels For Screening Of Gastric Cancer Using Serum Pepsinogens And Distributions Of Levels Of Serum Pepsinogen I, Ii And Of Pg I/Pg Ii Ratios In A Gastric Cancer Case-Control Study. Journal Of Epidemiology, 7(3), 143-151.
Barbeito, I., & Cao, R. (2020). Nonparametric Curve Estimation And Bootstrap Bandwidth Selection. Wiley Interdisciplinary Reviews: Computational Statistics, 12(3), E1488.
Carvalho, V. I. D., & Branscum, A. J. (2018). Bayesian Nonparametric Inference For The Three-Class Youden Index And Its Associated Optimal Cutoff Points. Statistical Methods In Medical Research, 27(3), 689-700.
Dardick, W. R., & Weiss, B. A. (2019). An Investigation Of Chi-Square And Entropy Based Methods Of Item-Fit Using Item Level Contamination In Item Response Theory. Journal Of Modern Applied Statistical Methods, 18.
Duin. (1976). On The Choice Of Smoothing Parameters For Parzen Estimators Of Probability Density Functions. Ieee Transactions On Computers, 100(11), 1175-1179.
Eckes, T. (2017). Setting Cut Scores On An Efl Placement Test Using The Prototype Group Method: A Receiver Operating Characteristic (Roc) Analysis. Language Testing, 34(3), 383-411.
Ewald, B. (2006). Post Hoc Choice Of Cut Points Introduced Bias To Diagnostic Research. Journal Of Clinical Epidemiology, 59(8), 798-801.
Fluss, R., Faraggi, D., & Reiser, B. (2005). Estimation Of The Youden Index And Its Associated Cutoff Point. Biometrical Journal, 47(4), 458-472. Doi:10.1002/Bimj.200410135
Greiner, M., Pfeiffer, D., & Smith, R. D. (2000). Principles And Practical Application Of The Receiver-Operating Characteristic Analysis For Diagnostic Tests. Preventive Veterinary Medicine, 45(1-2), 23-41.
Grmec, Š., & Gašparovic, V. (2000). Comparison Of Apache Ii, Mees And Glasgow Coma Scale In Patients With Nontraumatic Coma For Prediction Of Mortality. Critical Care, 5(1), 1-5.
Habbema, J., Hermans, J., & Van Den Broek, K. (1974). A Stepwise Discriminant Analysis Program Using Density Estimation.
Hall, P. (1982). Cross-Validation In Density Estimation. Biometrika, 69(2), 383-390.
Hanley, J. A., & Mcneil, B. J. (1982). The Meaning And Use Of The Area Under A Receiver Operating Characteristic (Roc) Curve. Radiology, 143(1), 29-36.
Heidenreich, N.-B., Schindler, A., & Sperlich, S. (2013). Bandwidth Selection For Kernel Density Estimation: A Review Of Fully Automatic Selectors. Asta Advances In Statistical Analysis, 97, 403-433.
Hirschfeld, G., & Do Brasil, P. E. A. A. (2014). A Simulation Study Into The Performance Of “Optimal” Diagnostic Thresholds In The Population:“Large” Effect Sizes Are Not Enough. Journal Of Clinical Epidemiology, 67(4), 449-453.
Hsiao, J. K., Bartko, J. J., & Potter, W. Z. (1989). Diagnosing Diagnoses: Receiver Operating Characteristic Methods And Psychiatry. Archives Of General Psychiatry, 46(7), 664-667.
Hsieh, F., & Turnbull, B. (1992). Nonparametric Methods For Evaluating Diagnostic Tests. Retrieved From
Jones, M. C., Marron, J. S., & Sheather, S. J. (1996). A Brief Survey Of Bandwidth Selection For Density Estimation. Journal Of The American Statistical Association, 91(433), 401-407.
Kile, H. (2010). Bandwidth Selection In Kernel Density Estimation. (Master ). University Of Science And Technology, Norwegian.
Leeflang, M. M., Moons, K. G., Reitsma, J. B., & Zwinderman, A. H. (2008). Bias In Sensitivity And Specificity Caused By Data-Driven Selection Of Optimal Cutoff Values: Mechanisms, Magnitude, And Solutions. Clinical Chemistry, 54(4), 729-737.
Loader, C. R. (1999). Bandwidth Selection: Classical Or Plug-In? The Annals Of Statistics, 27(2), 415-438.
Luo, J., & Xiong, C. (2013). Youden Index And Associated Cut-Points For Three Ordinal Diagnostic Groups. Communications In Statistics-Simulation And Computation, 42(6), 1213-1234.
Metz, C. E. (1989). Some Practical Issues Of Experimental Design And Data Analysis In Radiological Roc Studies. Investigative Radiology, 24(3), 234-245.
Nakas, C. T., Alonzo, T. A., & Yiannoutsos, C. T. (2010). Accuracy And Cut‐Off Point Selection In Three‐Class Classification Problems Using A Generalization Of The Youden Index. Statistics In Medicine, 29(28), 2946-2955.
Park, B. U., & Marron, J. S. (1990). Comparison Of Data-Driven Bandwidth Selectors. Journal Of The American Statistical Association, 85(409), 66-72.
Parzen, E. (1962). On Estimation Of A Probability Density Function And Mode. The Annals Of Mathematical Statistics, 33(3), 1065-1076.
Ruopp, M. D., Perkins, N. J., Whitcomb, B. W., & Schisterman, E. F. (2008). Youden Index And Optimal Cut‐Point Estimated From Observations Affected By A Lower Limit Of Detection. Biometrical Journal: Journal Of Mathematical Methods In Biosciences, 50(3), 419-430.
Schisterman, E. F., Perkins, N. J., Liu, A., & Bondell, H. (2005). Optimal Cut-Point And Its Corresponding Youden Index To Discriminate Individuals Using Pooled Blood Samples. Epidemiology, 73-81.
Shapiro, D. E. (1999). The Interpretation Of Diagnostic Tests. Statistical Methods In Medical Research, 8(2), 113-134.
Silverman, B. W. (2018). Density Estimation For Statistics And Data Analysis: Routledge.
Somoza, E., Mossman, D., & Mcfeeters, L. (1990). The Info-Roc Technique: A Method For Comparing And Optimizing Inspection Systems. Review Of Progress In Quantitative Nondestructive Evaluation, 601-608.
Thiele, C., & Hirschfeld, G. (2020). Cutpointr: Improved Estimation And Validation Of Optimal Cutpoints In R. Arxiv Preprint Arxiv:2002.09209.
Trosset, M. W. (2009). An Introduction To Statistical Inference And Its Applications With R: Crc Press.
Van Es, B. (1991). Likelihood Cross-Validation Bandwidth Selection For Nonparametric Kernel Density Estimators †. Journal Of Nonparametric Statistics, 1(1-2), 83-110. Doi:10.1080/10485259108832513
Wand, M. P., & Jones, M. C. (1994). Kernel Smoothing: Crc Press.
Węglarczyk, S. (2018). Kernel Density Estimation And Its Application. Paper Presented At The Itm Web Of Conferences.
Youden, W. J. (1950). Index For Rating Diagnostic Tests. Cancer, 3(1), 32-35.
Zhou, X.-H., Mcclish, D. K., & Obuchowski, N. A. (2009). Statistical Methods In Diagnostic Medicine: John Wiley & Sons.
Zou, K. H., Tempany, C. M., Fielding, J. R., & Silverman, S. G. (1998). Original Smooth Receiver Operating Characteristic Curve Estimation From Continuous Data: Statistical Methods For Analyzing The Predictive Value Of Spiral Ct Of Ureteral Stones. Academic Radiology, 5(10), 680-687.
Zucchini, W., Berzel, A., & Nenadic, O. (2003). Applied Smoothing Techniques. Part I: Kernel Density Estimation, 15, 1-20.