Maryam Parsaeian; Ebrahim Khodaie; Balal Izanloo; Keyvan Salehi; sima naghizadeh
Abstract
The Youden index is a commonly used summary measure for the Receiver Operator Characteristic (ROC) curve that both measures the performance of a criterion-referenced test and specifies the cutoff score value for the test. This research aims to compare and evaluate three empirical non-parametric estimation ...
Read More
The Youden index is a commonly used summary measure for the Receiver Operator Characteristic (ROC) curve that both measures the performance of a criterion-referenced test and specifies the cutoff score value for the test. This research aims to compare and evaluate three empirical non-parametric estimation methods, kernel with Silverman's bandwidth method and kernel with maximum likelihood cross-validation bandwidth method to calculate the value of Youden's index. In this research, bootstrap standard error (BSE), root mean square error (RMSE), square integrated error (ISE) and mean square integrated error (MISE) indices were used to evaluate the performance. The results show that the kernel method with maximum likelihood cross-validation had a higher Youden index value. The obtained cutoff scores were 479 for the kernel methods and 465 for the empirical method. According to the acceptable results of the evaluation indices, kernel methods especially with the optimal bandwidth of the maximum likelihood cross-validation lead to more reliable estimates of the Youden index and the cutoff score in Tolimo test results.
Ali Baniasadi; Keyvan Salehi; Ebrahim Khodaie; Khosro Bagheri; Balal Izanloo
Abstract
The present study aimed to investigate the psychometric properties of fair classroom assessment Rubric based on Item-Response theory. For this purpose, a sample of 511 students of the University of Tehran was selected by the available sampling method and answered Rubric questions. At this stage, to determine ...
Read More
The present study aimed to investigate the psychometric properties of fair classroom assessment Rubric based on Item-Response theory. For this purpose, a sample of 511 students of the University of Tehran was selected by the available sampling method and answered Rubric questions. At this stage, to determine the application of unidimensional or multidimensional models, DETECT and parallel analysis methods were used. The results of both methods rejected the unidimensionality of the data and the results of the parallel analysis showed the extraction of three factors from the data. Also, the comparison of unidimensional and multidimensional model fit indices including log-likelihood, likelihood ratio, Root Mean Square Error of Approximation and comparison of Bayesian and Akaike information criteria confirmed the better fit of the multidimensional model for the data. Thus, due to the polytomous of the answers to the questions, the multidimensional graded response model was used to estimate the parameters of the questions. The reliability of each of the subscales of procedural fairness, nature of assessment and interactional fairness were 0.85, 0.69 and 0.63, respectively. Estimation of the discrimination parameters ranged from 1.048 to 5.802, which showed that all the questions performed well in the discrimination of the upper and lower levels of the fair classroom assessment, and after controlling the false discovery rate, the S-X2 statistic showed a good fit of all Rubric questions. In general, the results of this study show that the developed Rubric has appropriate psychometric properties to evaluate the quality of fairness in the classroom assessment.
Pouria Rezasoltani; Ebrahim Khodaie; jalil Younesi; Amin Mousavi; Ali Moghadamzade
Abstract
Person fit assessment is useful in ensuring validity and fairness in the use and interpretation of test scores. In this research, applied the H^T person fit statistic to examine response pattern of TIMSS eight grade mathematics test of Australia, Iran, and Republic of Korea. In order to investigate the ...
Read More
Person fit assessment is useful in ensuring validity and fairness in the use and interpretation of test scores. In this research, applied the H^T person fit statistic to examine response pattern of TIMSS eight grade mathematics test of Australia, Iran, and Republic of Korea. In order to investigate the effect of contextual variables on students’ person fit statistic, hierarchical linear modeling was used, due to the hierarchical structure of data. Based on intraclass correlation coefficient, 83.7% of variance of the H^T person fit statistic is at student level, and 16.3% of variance of the H^T person fit statistic is at school and country levels. In addition, according to the hierarchy linear final model between the H^T person fit statistic and factors of students, schools, and countries; only the average of students mathematics achievement in countries, school emphasis on students academic success, students confident in mathematics, and estimate of students ability, are significant factors in the final model.
Somayeh Kaveh; Ebrahim Khodaie; Amin Musavi; Ali Moghadamzadeh; jalil Younesi
Abstract
In order to facilitate the interpretation of raw scores, they are usually converted to scale scores. In some cases, these conversions are a series of nonlinear transformations that can affect the conditional standard error of measurement throughout the scale of score. Therefore, the purpose of this study ...
Read More
In order to facilitate the interpretation of raw scores, they are usually converted to scale scores. In some cases, these conversions are a series of nonlinear transformations that can affect the conditional standard error of measurement throughout the scale of score. Therefore, the purpose of this study was to introduce methods for calculating the conditional standard error of measurement based on the strong true score theory. Furthermore, comparison of normalized and equipercentile nonlinear transformations on the raw scores of the academic achievements of the graduates of mathematical sciences in 2014 and their effect on conditional standard error of measurement was also conducted. So, in order to achieve these purposes, we used a sample of 3943 high school graduates of Mathematics and Physics in 2014 who had participated in national university entrance examination in 2015 randomly selected by National Organization of Educational Testing. The conditional standard error of measurement under these transformations was estimated based on the binomial procedure of Brennan and Lee (1999) and Chang (2006) method based on the beta-binomial distribution. The results of this study indicated that the conditional standard error of measurement of the Chang was smoother than binomial procedure, but in both methods the estimated errors are larger for middle points and smaller for extreme points. Additionally, the conditional standard errors of measurement of equipercentile were always less than normalized tranformation, so the equipercentile method found to be better than normalized transformation.
mohammad ahmadi deh qutbuddini; ebrahim khodai; Valiollah Farzad; ali moghadam zadeh; masoud kabiri
Abstract
The present study has been done with the purpose of investigating the dimensionality and differential item functioning of the testlet-based test of Iran's PIRLS 2011. In order to analyze the dimensionality, graded response and bi-factor item-response theory models were used with full-information maximum ...
Read More
The present study has been done with the purpose of investigating the dimensionality and differential item functioning of the testlet-based test of Iran's PIRLS 2011. In order to analyze the dimensionality, graded response and bi-factor item-response theory models were used with full-information maximum likelihood estimation method and to analyze the differential item functioning multiple-group bi-factor model of Cai et al (2011) was applied. The results of the dimensionality investigation showed that the bi-factor model is better fitted to the data than the graded response model both in Iran's total sample and in boy and girl groups. The results of testlets effect variance showed that effects of second factors on Iranian students' performance in two testlet related to literal comprehension, has caused dimensionality in Iran's PIRLS testlets. The results showed that there was no significant difference in average students' performance of the boy and girl in general latent trait of reading comprehension, but the difference between the average reading proficiency of the boy and the girl in three literal and three informational testlet in favor of girls was significant. The result of differential items functioning based on the bifactor model showed that many items have an uniform and non-uniform differential item functioning, and boys in multiple-choice items and girls in constructed response items have better performances. In general, the results showed that in Iran's PIRLS 2011 testlets, the traits related to the two literal comprehension testlets were differently perceived between boy and girl students, and these two testlet had more local item dependence among girls than boys. Also, the results indicated a difference between the performance of Iranian boy and girl students in the mixed items format test of PIRLS.